From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Unicode width data inconsistent/outdated
Date: Mon, 07 Aug 2017 10:41:00 -0000 [thread overview]
Message-ID: <20170807104127.GT25551@calimero.vinschen.de> (raw)
In-Reply-To: <20170807092820.GQ25551@calimero.vinschen.de>
[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]
On Aug 7 11:28, Corinna Vinschen wrote:
> On Aug 5 21:06, Thomas Wolff wrote:
> > Am 04.08.2017 um 19:01 schrieb Corinna Vinschen:
> > > This shouldn't matter to you, just keep it in place. It's a historical,
> > > low footprint conversion for japanese characters without pulling in the
> > > unicode stuff. Not used on Cygwin so just ignore.
> > I had noticed meanwhile that this is not active in Cygwin, but it's broken
> > anyway for multiple reasons:
> > * platforms for which wchar_t is not Unicode should be explicitly listed
> > * if used, the transformation needs to be applied to all non-Unicode
> > locales (also Chinese, Korean, and even 8-bit locales such as *.CP1252)
> > * for towupper and towlower, the result must be back-transformed into the
> > respective locale encoding
> > * particulary the locale-specific _l functions inconsistently do not use
> > the transformation but have this note:
>
> No, no, no. The functionality is restricted to certain use-cases and
> always was. It was a paid-for customer extension back in the day and it
> was *sufficient* for the use-cases. It's not clear how many newlib
> users are still using it, but it's not a good idea to remove it without
> checking first. That means, ask on the newlib mailing list how many are
> using the historical jp2uc code, and if we don't get a reply within,
> say, a month, we can probably nuke it.
To clarify where we're coming from:
If you look into newlib/libc/locale/locale.c, function __loadlocale,
you'll notice that outside of Cygwin, only six single/double/multi-bytes
codesets are supported at all:
ASCII
ISO-8859-1
EUCJP
JIS
SJIS
UTF-8
The multichar/widechar conversion functions for EUCJP, JIS and SJIS were
implemented to have a low footprint in the first place, see, for
instance, __sjis_wctomb in newlib/libc/stdlib/wctomb_r.c.
This is all about simplification for small targets. There was never a
requirement that converting a UTF-8 char to wchar_t, and converting the
equivalent SJIS char to wchar_t would result in the same wide char.
Consequentially, Cygwin does not use these conversion functions. Rather
it uses Windows conversion functions, see the conversion functions in
winsup/cygwin/strfuncs.cc, to get a consistent wide char representation
(UTF-16). Another side-effect is that Cygwin does not support JIS at
all, only SJIS, see the comment in strfuncs.cc.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2017-08-07 10:41 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-26 14:23 Thomas Wolff
2017-07-26 14:59 ` Corinna Vinschen
2017-07-26 17:03 ` Yaakov Selkowitz
2017-07-26 17:06 ` Corinna Vinschen
2017-07-27 17:09 ` Thomas Wolff
2017-07-29 15:23 ` Corinna Vinschen
2017-08-03 19:44 ` Thomas Wolff
2017-08-04 17:02 ` Corinna Vinschen
2017-08-05 19:06 ` Thomas Wolff
2017-08-05 20:24 ` Brian Inglis
2017-08-05 20:53 ` Thomas Wolff
2017-08-07 9:28 ` Corinna Vinschen
2017-08-07 10:41 ` Corinna Vinschen [this message]
2017-08-07 19:07 ` Brian Inglis
2017-08-07 19:31 ` Thomas Wolff
2017-08-07 21:29 ` Brian Inglis
2017-08-08 0:29 ` Thomas Wolff
2017-08-07 19:27 ` Thomas Wolff
2017-08-08 8:22 ` Corinna Vinschen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170807104127.GT25551@calimero.vinschen.de \
--to=corinna-cygwin@cygwin.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).