public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
From: Takashi Yano <takashi.yano@nifty.ne.jp>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Date: Fri, 4 Sep 2020 23:50:16 +0900	[thread overview]
Message-ID: <20200904235016.9c34d04e809b5ad9f2bdfdf3@nifty.ne.jp> (raw)
In-Reply-To: <20200904124400.GQ4127@calimero.vinschen.de>

Hi Corinna,

On Fri, 4 Sep 2020 14:44:00 +0200
Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> > 
> > On Thu, 3 Sep 2020 19:59:12 +0200
> > Corinna Vinschen wrote:
> > > The only idea I had so far was, changing the way __set_charset_from_locale
> > > works from within _setlocale_r:
> > > 
> > > We could add a Cygwin-specific function only fetching the codepage and
> > > call it unconditionally from _setlocale_r.  __set_charset_from_locale is
> > > called with a new parameter "codepage", so it doesn't have to fetch the
> > > CP by itself, but it's still only called from _setlocale_r if necessary.
> > > 
> > > Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
> > > could be done at the point the codepage is actually required.
> > 
> > I think I have found the answer to your request.
> > Patch attached. What do you think of this patch?
> > 
> > Calling initial_setlocale() is necessary because
> > nl_langinfo() always returns "ANSI_X3.4-1968"
> > regardless locale setting if this is not called.
> 
> Well, this is correct.  Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally.  Everything,
> including the terminal, is supposed to default to UTF-8.  That's the
> most sane default, even if an application is not locale-aware.
> 
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C".  That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
> 
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions.  What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
> 
> If we change that, setup_locale can simply go away.  Below is a patch
> doing just that.  Can you please check if that works in your test
> scenarios?

I tried your patch, but unfortunately it does not work.
cygheap->locale.term_code_page is 0 in pty master.

If the following lines are moved in internal_setlocale(),

  const char *charset = __locale_charset (__get_global_locale ());
  debug_printf ("Global charset set to %s", charset);
  /* Store codepage to be utilized by pseudo console code. */
  cygheap->locale.term_code_page =
            __eval_codepage_from_internal_charset (charset);

in internal_setlocale() before

  /* Don't do anything if the charset hasn't actually changed. */
  if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
    return;

cygheap->locale.term_code_page is always 65001 even if mintty is
startted by
mintty -o locale=ja_JP -o charset=CP932
or
mintty -o locale=ja_JP -o charset=EUCJP

Perhaps, this is because LANG is not set properly yet when mintty
is started.

> However, there's something which worries me.  Why do we need or set the
> pseudo terminal codepage at all?  I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes.  Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console?  That would simplify
> this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.

WriteConsoleW() cannot be used because the handle to_master_cyg is
not a console handle.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

  parent reply	other threads:[~2020-09-04 14:50 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 16:19 Johannes Schindelin
2020-09-02  6:06 ` Johannes Schindelin
2020-09-02  8:30 ` Corinna Vinschen
2020-09-02  8:38   ` Corinna Vinschen
2020-09-02 10:54     ` Takashi Yano
2020-09-02 15:24       ` Corinna Vinschen
2020-09-02 16:09         ` Corinna Vinschen
2020-09-02 16:25         ` Takashi Yano
2020-09-02 16:38           ` Corinna Vinschen
2020-09-03 17:59             ` Corinna Vinschen
2020-09-04  9:21               ` Takashi Yano
2020-09-04 12:44                 ` Corinna Vinschen
2020-09-04 14:05                   ` Brian Inglis
2020-09-04 14:50                   ` Takashi Yano [this message]
2020-09-04 19:22                     ` Corinna Vinschen
2020-09-05  8:43                       ` Takashi Yano
2020-09-05 11:15                         ` Takashi Yano
2020-09-05 14:15                           ` Takashi Yano
2020-09-06  8:57                             ` Takashi Yano
2020-09-06 10:15                               ` Takashi Yano
2020-09-06 16:04                                 ` Takashi Yano
2020-09-07  4:45                                   ` Takashi Yano
2020-09-07  9:08                                     ` Corinna Vinschen
2020-09-07  9:54                                       ` Takashi Yano
2020-09-07  9:59                                         ` Takashi Yano
2020-09-08  8:40                                     ` Corinna Vinschen
2020-09-08  9:45                                       ` Takashi Yano
2020-09-08 19:16                                         ` Corinna Vinschen
2020-09-10 13:08                                         ` Takashi Yano
2020-09-07  8:27                           ` Corinna Vinschen
2020-09-07  8:38                             ` Takashi Yano
2020-09-07  9:09                               ` Corinna Vinschen
2020-09-07  8:26                         ` Corinna Vinschen
2020-09-07  9:36                           ` Takashi Yano
2020-09-07 18:24                             ` Takashi Yano
2020-09-07 21:08                             ` Johannes Schindelin
2020-09-08  4:52                               ` Brian Inglis
2020-09-07 10:27                           ` Takashi Yano
2020-09-07 13:40                             ` Takashi Yano
2020-09-08  7:55                               ` Corinna Vinschen
2020-09-06 10:28                   ` Takashi Yano
2020-09-07  8:33                     ` Corinna Vinschen
2020-09-02  9:41   ` Takashi Yano
2020-09-02  6:26     ` Johannes Schindelin
2020-09-02 13:06       ` Takashi Yano
2020-09-02  9:12         ` Johannes Schindelin
2020-09-02 14:52           ` Takashi Yano
2020-09-04 10:03 ` Takashi Yano
2020-09-04  6:23   ` Johannes Schindelin
2020-09-04 15:03     ` Takashi Yano
2020-09-07 21:17       ` Johannes Schindelin
2020-09-08  8:16         ` Takashi Yano
2020-09-09  7:21           ` Corinna Vinschen
2020-09-10  0:15             ` Takashi Yano
2020-09-10 12:34               ` Takashi Yano
2020-09-11  9:05                 ` Corinna Vinschen
2020-09-11  9:23                   ` Corinna Vinschen
2020-09-10 14:04               ` Corinna Vinschen
2020-09-10 14:16                 ` Takashi Yano
2020-09-10 14:18                   ` Takashi Yano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200904235016.9c34d04e809b5ad9f2bdfdf3@nifty.ne.jp \
    --to=takashi.yano@nifty.ne.jp \
    --cc=cygwin-patches@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).