From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Date: Fri, 4 Sep 2020 08:05:13 -0600 [thread overview]
Message-ID: <79c66b27-1d0c-28b7-b5e1-6822b08faf9e@SystematicSw.ab.ca> (raw)
In-Reply-To: <20200904124400.GQ4127@calimero.vinschen.de>
On 2020-09-04 06:44, Corinna Vinschen wrote:
> Hi Takashi,
>
> On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>>
>> On Thu, 3 Sep 2020 19:59:12 +0200
>> Corinna Vinschen wrote:
>>> The only idea I had so far was, changing the way __set_charset_from_locale
>>> works from within _setlocale_r:
>>>
>>> We could add a Cygwin-specific function only fetching the codepage and
>>> call it unconditionally from _setlocale_r. __set_charset_from_locale is
>>> called with a new parameter "codepage", so it doesn't have to fetch the
>>> CP by itself, but it's still only called from _setlocale_r if necessary.
>>>
>>> Would that be sufficient? The CP conversion from 20127/ASCII to 65001/UTF8
>>> could be done at the point the codepage is actually required.
>>
>> I think I have found the answer to your request.
>> Patch attached. What do you think of this patch?
>>
>> Calling initial_setlocale() is necessary because
>> nl_langinfo() always returns "ANSI_X3.4-1968"
>> regardless locale setting if this is not called.
>
> Well, this is correct. Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally. Everything,
> including the terminal, is supposed to default to UTF-8. That's the
> most sane default, even if an application is not locale-aware.
>
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C". That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
>
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions. What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
>
> If we change that, setup_locale can simply go away. Below is a patch
> doing just that. Can you please check if that works in your test
> scenarios?
>
> However, there's something which worries me. Why do we need or set the
> pseudo terminal codepage at all? I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes. Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console? That would simplify
> this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.
IIRC his locale was ja_JP.UTF-8 but he got English messages on CP 932!
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
next prev parent reply other threads:[~2020-09-04 14:05 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-01 16:19 Johannes Schindelin
2020-09-02 6:06 ` Johannes Schindelin
2020-09-02 8:30 ` Corinna Vinschen
2020-09-02 8:38 ` Corinna Vinschen
2020-09-02 10:54 ` Takashi Yano
2020-09-02 15:24 ` Corinna Vinschen
2020-09-02 16:09 ` Corinna Vinschen
2020-09-02 16:25 ` Takashi Yano
2020-09-02 16:38 ` Corinna Vinschen
2020-09-03 17:59 ` Corinna Vinschen
2020-09-04 9:21 ` Takashi Yano
2020-09-04 12:44 ` Corinna Vinschen
2020-09-04 14:05 ` Brian Inglis [this message]
2020-09-04 14:50 ` Takashi Yano
2020-09-04 19:22 ` Corinna Vinschen
2020-09-05 8:43 ` Takashi Yano
2020-09-05 11:15 ` Takashi Yano
2020-09-05 14:15 ` Takashi Yano
2020-09-06 8:57 ` Takashi Yano
2020-09-06 10:15 ` Takashi Yano
2020-09-06 16:04 ` Takashi Yano
2020-09-07 4:45 ` Takashi Yano
2020-09-07 9:08 ` Corinna Vinschen
2020-09-07 9:54 ` Takashi Yano
2020-09-07 9:59 ` Takashi Yano
2020-09-08 8:40 ` Corinna Vinschen
2020-09-08 9:45 ` Takashi Yano
2020-09-08 19:16 ` Corinna Vinschen
2020-09-10 13:08 ` Takashi Yano
2020-09-07 8:27 ` Corinna Vinschen
2020-09-07 8:38 ` Takashi Yano
2020-09-07 9:09 ` Corinna Vinschen
2020-09-07 8:26 ` Corinna Vinschen
2020-09-07 9:36 ` Takashi Yano
2020-09-07 18:24 ` Takashi Yano
2020-09-07 21:08 ` Johannes Schindelin
2020-09-08 4:52 ` Brian Inglis
2020-09-07 10:27 ` Takashi Yano
2020-09-07 13:40 ` Takashi Yano
2020-09-08 7:55 ` Corinna Vinschen
2020-09-06 10:28 ` Takashi Yano
2020-09-07 8:33 ` Corinna Vinschen
2020-09-02 9:41 ` Takashi Yano
2020-09-02 6:26 ` Johannes Schindelin
2020-09-02 13:06 ` Takashi Yano
2020-09-02 9:12 ` Johannes Schindelin
2020-09-02 14:52 ` Takashi Yano
2020-09-04 10:03 ` Takashi Yano
2020-09-04 6:23 ` Johannes Schindelin
2020-09-04 15:03 ` Takashi Yano
2020-09-07 21:17 ` Johannes Schindelin
2020-09-08 8:16 ` Takashi Yano
2020-09-09 7:21 ` Corinna Vinschen
2020-09-10 0:15 ` Takashi Yano
2020-09-10 12:34 ` Takashi Yano
2020-09-11 9:05 ` Corinna Vinschen
2020-09-11 9:23 ` Corinna Vinschen
2020-09-10 14:04 ` Corinna Vinschen
2020-09-10 14:16 ` Takashi Yano
2020-09-10 14:18 ` Takashi Yano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79c66b27-1d0c-28b7-b5e1-6822b08faf9e@SystematicSw.ab.ca \
--to=brian.inglis@systematicsw.ab.ca \
--cc=cygwin-patches@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).