From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107497 invoked by alias); 4 Sep 2018 21:43:23 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 107474 invoked by uid 89); 4 Sep 2018 21:43:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-0.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy==ef=bf=bd, 04.09.2018, 04092018, Hx-languages-length:4668?= X-HELO: mout.kundenserver.de Received: from mout.kundenserver.de (HELO mout.kundenserver.de) (212.227.126.134) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Sep 2018 21:43:20 +0000 Received: from [192.168.178.45] ([95.91.209.148]) by mrelayeu.kundenserver.de (mreue007 [212.227.15.167]) with ESMTPSA (Nemesis) id 0MYpLq-1gRR833FDe-00VTdC for ; Tue, 04 Sep 2018 23:43:17 +0200 Subject: Re: Cygwin fails to utilize Unicode replacement character To: cygwin@cygwin.com References: <4a728822-3c4f-c99f-51cd-63822445aa18@towo.net> <5b8ee2ae.1c69fb81.7f961.3c7d@mx.google.com> From: Thomas Wolff Message-ID: <5c366e53-ad20-7ccc-5d76-c4fd5adefdf9@towo.net> Date: Tue, 04 Sep 2018 21:43:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <5b8ee2ae.1c69fb81.7f961.3c7d@mx.google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-09/txt/msg00080.txt.bz2 Am 04.09.2018 um 21:53 schrieb Steven Penny: > On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote: > ... >> the .notdef glyph is not an appropriate indication of illegal >> encoding (like broken UTF-8 bytes) > > true, but neither is U+2592. as far as i know U+2592 is not defined > officially > anywhere as being a representation of anything other than "MEDIUM SHADE". Traditionally, many terminals used to display the DEL character as a checkered block, which is more or less the MEDIUM SHADE. This makes the glyph appear somewhat "erroneous" by convention. > Corinna originally added it in 2009: > > http://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git&a=commitdiff&h=161211d > > > with no justification of why it was chosen that i can tell. Justification is traditional usage of the symbol as described above. > similarly, mintty > actually changed from U+FFFD to U+2592 in 2009: > > http://github.com/mintty/mintty/commit/90c11d3 > > with actually a good reason, which was to avoid ambiguity with fonts > that didnt > have U+FFFD. but again, no reason why U+2592 was chosen. i personally > see both > sides of the argument but i tend to land of the side of any standards > if they > exist. > Here is the standard for U+FFFD: > > http://unicode.org/charts/nameslist/n_FFF0.html FFFD     �     Replacement Character           •    used to replace an incoming character whose value is unknown or unrepresentable in Unicode > > if we were to use something other than U+FFFD, I would propose U+25A1, > as it is > also defined by Unicode: > >    25A1     □     White Square >    •    may be used to represent a missing ideograph > > http://unicode.org/charts/nameslist/n_25A0.html Quoting yourself from your other response: > U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case > of missing character - the ".notdef" glyph is used This is my point. We have two use cases here: invalid code point -> MEDIUM SHADE valid code point with no glyph in font -> .notdef glyph -> WHITE SQUARE Now if you switch to FFFD REPLACEMENT CHARACTER for invalid code point, and considering that it does not exist in most actual fonts and that the console does not apply font fallback, it will resolve to WHITE SQUARE, thus: folding the two different use cases into the same appearance, which is bad. Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple