public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Unicode width data inconsistent/outdated
Date: Mon, 07 Aug 2017 10:41:00 -0000	[thread overview]
Message-ID: <20170807104127.GT25551@calimero.vinschen.de> (raw)
In-Reply-To: <20170807092820.GQ25551@calimero.vinschen.de>

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On Aug  7 11:28, Corinna Vinschen wrote:
> On Aug  5 21:06, Thomas Wolff wrote:
> > Am 04.08.2017 um 19:01 schrieb Corinna Vinschen:
> > > This shouldn't matter to you, just keep it in place.  It's a historical,
> > > low footprint conversion for japanese characters without pulling in the
> > > unicode stuff.  Not used on Cygwin so just ignore.
> > I had noticed meanwhile that this is not active in Cygwin, but it's broken
> > anyway for multiple reasons:
> >    * platforms for which wchar_t is not Unicode should be explicitly listed
> >    * if used, the transformation needs to be applied to all non-Unicode
> > locales (also Chinese, Korean, and even 8-bit locales such as *.CP1252)
> >    * for towupper and towlower, the result must be back-transformed into the
> > respective locale encoding
> >    * particulary the locale-specific _l functions inconsistently do not use
> > the transformation but have this note:
> 
> No, no, no.  The functionality is restricted to certain use-cases and
> always was.  It was a paid-for customer extension back in the day and it
> was *sufficient* for the use-cases.  It's not clear how many newlib
> users are still using it, but it's not a good idea to remove it without
> checking first.  That means, ask on the newlib mailing list how many are
> using the historical jp2uc code, and if we don't get a reply within,
> say, a month, we can probably nuke it.

To clarify where we're coming from:

If you look into newlib/libc/locale/locale.c, function __loadlocale,
you'll notice that outside of Cygwin, only six single/double/multi-bytes
codesets are supported at all:

  ASCII
  ISO-8859-1
  EUCJP
  JIS
  SJIS
  UTF-8

The multichar/widechar conversion functions for EUCJP, JIS and SJIS were
implemented to have a low footprint in the first place, see, for
instance, __sjis_wctomb in newlib/libc/stdlib/wctomb_r.c.

This is all about simplification for small targets.  There was never a
requirement that converting a UTF-8 char to wchar_t, and converting the
equivalent SJIS char to wchar_t would result in the same wide char.

Consequentially, Cygwin does not use these conversion functions.  Rather
it uses Windows conversion functions, see the conversion functions in
winsup/cygwin/strfuncs.cc, to get a consistent wide char representation
(UTF-16).  Another side-effect is that Cygwin does not support JIS at
all, only SJIS, see the comment in strfuncs.cc.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2017-08-07 10:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 14:23 Thomas Wolff
2017-07-26 14:59 ` Corinna Vinschen
2017-07-26 17:03   ` Yaakov Selkowitz
2017-07-26 17:06     ` Corinna Vinschen
2017-07-27 17:09       ` Thomas Wolff
2017-07-29 15:23         ` Corinna Vinschen
2017-08-03 19:44           ` Thomas Wolff
2017-08-04 17:02             ` Corinna Vinschen
2017-08-05 19:06               ` Thomas Wolff
2017-08-05 20:24                 ` Brian Inglis
2017-08-05 20:53                   ` Thomas Wolff
2017-08-07  9:28                 ` Corinna Vinschen
2017-08-07 10:41                   ` Corinna Vinschen [this message]
2017-08-07 19:07                   ` Brian Inglis
2017-08-07 19:31                     ` Thomas Wolff
2017-08-07 21:29                       ` Brian Inglis
2017-08-08  0:29                         ` Thomas Wolff
2017-08-07 19:27                   ` Thomas Wolff
2017-08-08  8:22                     ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170807104127.GT25551@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).