From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A57EE3858D32; Mon, 12 Feb 2024 13:45:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A57EE3858D32 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1707745507; bh=KtKXju9dLVCZffnTknH4noxp+X3Q66vyvJS5EBLGPjM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qTBI6I+kJt/qSP2jyGtYMtfBKKs+VTO3gliPLzntYYPnYb31c7CgJ6YtZCoZX+dil SpeGTaf8b0Rrt2Wow7MpB2t4w+9SLl0KWolC/eMoeXkZsbOjHrPWuQeMi4gYdROWFo McNcuqK1sUWPfZGjFRBaJKgfEEWtTPIDZGdCrmkc= From: "carlos at redhat dot com" To: libc-locales@sourceware.org Subject: [Bug localedata/31370] wcwidth() does not treat DEFAULT_IGNORABLE_CODE_POINTs as zero-width Date: Mon, 12 Feb 2024 13:45:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.40 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: everconfirmed bug_status cf_reconfirmed_on cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31370 Carlos O'Donell changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed| |2024-02-12 CC| |carlos at redhat dot com --- Comment #2 from Carlos O'Donell --- (In reply to Jules Bertholet from comment #0) > Unicode specifies (https://www.unicode.org/faq/unsup_char.html#3) that > characters with the `Default_Ignorable_Code_Point` property >=20 > > should be rendered as completely invisible (and non advancing, i.e. =E2= =80=9Czero width=E2=80=9D), if not explicitly supported in rendering. >=20 > Hence, `wcwidth()` should give them all a width of 0, with two exceptions: Please provide a patch to libc-alpha@sourceware.org following: https://sourceware.org/glibc/wiki/Contribution%20checklist Please also provide justification for the zero width by quoting another implementation that also provides zero width e.g. CLDR. The goal is for glibc to harmonize closer to CLDR. It seems sensible to me that they would be zero width if they are non-advancing, but that isn't always what an end user needs (as seen below). > - the soft hyphen (U+00AD SOFT HYPHEN) is assigned width 1 by longstanding > precedent We use 1 in UTF-8 (default width). So this matches. The expectation is that= the system is trying to determine a width where the hyphen is chosen during the display process. > - U+115F HANGUL CHOSEONG FILLER combines with jungseong and jongseong jamo > to form a width-2 syllable block, and should therefore keep its width 2 We use 2 in UTF-8. So this matches. ... 2 > However, `wcwidth()` currently also incorrectly assigns non-zero width to > U+3164 HANGUL FILLER and U+FFA0 HALFWIDTH HANGUL FILLER. This needs justification by highlighting that we are harmonizing the implementation with CLDR. Currently we have: ... 2 While U+FFA0 is default 1. Thanks for filling this issue. --=20 You are receiving this mail because: You are on the CC list for the bug.=