From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11849 invoked by alias); 5 Aug 2017 20:53:29 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 11210 invoked by uid 89); 5 Aug 2017 20:53:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.0 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=particulary, nonunicode, UD:*.CP1252, non-Unicode X-HELO: mout.kundenserver.de Received: from mout.kundenserver.de (HELO mout.kundenserver.de) (217.72.192.75) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 05 Aug 2017 20:53:27 +0000 Received: from [192.168.178.45] ([95.91.246.195]) by mrelayeu.kundenserver.de (mreue102 [212.227.15.183]) with ESMTPSA (Nemesis) id 0MhUV0-1dzbCo0Q7u-00Ma33 for ; Sat, 05 Aug 2017 22:53:24 +0200 Subject: Re: Unicode width data inconsistent/outdated To: cygwin@cygwin.com References: <20170726080859.GA24312@calimero.vinschen.de> <5d3cb047-49f8-26a6-d816-387a71486e99@cygwin.com> <20170726095016.GA25666@calimero.vinschen.de> <289bd98b-e644-888d-07f8-8965b6538373@towo.net> <20170728195826.GI24013@calimero.vinschen.de> <1244bd24-bb27-d185-1f24-61beae02c2cd@towo.net> <20170804170156.GL25551@calimero.vinschen.de> <30486790-c59d-9a78-6000-b3c20fb86d9d@towo.net> <1f320064-0f25-8a41-4ded-49bd750edae5@SystematicSw.ab.ca> From: Thomas Wolff Message-ID: <1018cbbf-e04d-3207-cafe-5a40c630bfa6@towo.net> Date: Sat, 05 Aug 2017 20:53:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1f320064-0f25-8a41-4ded-49bd750edae5@SystematicSw.ab.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-UI-Out-Filterresults: notjunk:1;V01:K0:vx59sAEgMe4=:lx2quRC3zD00V78wLQ+3rQ bol5YvKZkJvOKAEwM9n9fLrWZhhhTH7N+1h0mdsL7FYI6J8Y4fCQrflGJ8q0Q2ojkggYYHZHZ BbdaNQbk7iubOA6+HmzcB0XZ2d/KdoJfFkAdpuFRJlNX+lAfi43Pn8zR6MEHWHLO8C5zY+XrQ h3X/PSY7d98kK2ocWMpGQD0RE/AGu++qtmzJHvhBCm8Z6QiHewq+4VtAKcR1luc5y51fizr+3 n/pnxeYp9EW69EDG78LK1vBxzkYNt4xhS0XQwDc7E08CP+A8vLfaQ3Kt8yzIKjLypH1p3no4z nl5+gx502kWz3ZcJmy8v2Ab8YBBhllPMFPQsdGgkQ8qxEy7GaXP0jjDqTRIieXbKlnqAJkCTC UR2jQhISrAsW6Pu526cMdJ4ujIA+YJWk+jF6ZfZ7FsNEFnhd+nyllls+FXrCs3Agqj006Mm1C 5utze0v6vnVMDG5elBeiWK7tMhqniRUGrWQ4oWEt7SiKWJJwzPOS+vzKIJyKqg1vLVcU0GQjn szvXEHtFwLew607qYB5WI54XobBtXu2EwoqNuRhpcmBq8wXGozzr/rQ//FKMqFwrw/LTftQu7 E4+KYzzXCzcHFIuVHmhL1lxmzOo/iktGRpcHw/Zqx8lX40kTeBdXEvtK3veRHAAvjnkWAZZd2 2wnuVHo4rvYNuoEz9p8xkCUX+Cb2E/eF9mYj4xLbEHzgtoPopp6xv9AGtWpx9I6+Hq5W4/Dvc IR1TWKysy/nPQErkYbWTIkesmct5njTmprEZMPTwQGuEtTaLFheWIl6Vtyk= X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00059.txt.bz2 Am 05.08.2017 um 22:24 schrieb Brian Inglis: > On 2017-08-05 13:06, Thomas Wolff wrote: > ... >> Which other platforms do actually use newlib? > Many historical uPs and current uCs used in embedded systems supporting gcc not > using Linux, including RTEMS, devKits for Nintendo and Sony game systems, aome > Android, Google NaCl. Do they all handle wchar_t to be encoded locale-specifically? I doubt that. https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html particularly points out Solaris and FreeBSD, no others. >>>> Issue 3 is the special conversion jp2uc which seems to be half-bred; >>>> there is no such handling for Chinese or Korean. >>> This shouldn't matter to you, just keep it in place. It's a historical, low >>> footprint conversion for japanese characters without pulling in the unicode >>> stuff. Not used on Cygwin so just ignore. >> I had noticed meanwhile that this is not active in Cygwin, but it's broken >> anyway for multiple reasons: >> * platforms for which wchar_t is not Unicode should be explicitly listed >> * if used, the transformation needs to be applied to all non-Unicode locales >> (also Chinese, Korean, and even 8-bit locales such as *.CP1252) >> * for towupper and towlower, the result must be back-transformed into the >> respective locale encoding >> * particulary the locale-specific _l functions inconsistently do not use the >> transformation but have this note: >>> We're using a locale-independent representation of upper/lower case based >>> on Unicode data. Thus, the locale doesn't matter. >> So I'd suggest to drop that stuff unless someone would like to fix it. > Looks like JIS support is under newlib/iconvdata So maybe the conversion can call jisx0201_to_ucs4 etc. from there, and also the back-conversion for towupper/lower is available. But then the stuff is still broken for the other reasons. I could map the _l functions properly, if that's really desired, but how to handle other encodings and on which platforms? Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple