public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Date: Mon, 7 Sep 2020 10:26:33 +0200	[thread overview]
Message-ID: <20200907082633.GC4127@calimero.vinschen.de> (raw)
In-Reply-To: <20200905174301.adbb3c147122fbe0636a0d56@nifty.ne.jp>

Hi Takashi,

On Sep  5 17:43, Takashi Yano via Cygwin-patches wrote:
> On Fri, 4 Sep 2020 21:22:35 +0200
> Corinna Vinschen wrote:
> > So this boils down to the fact that term_code_page must be set
> > after the application is already running and as soo as it creates
> > the pty, me thinks.  What if __eval_codepage_from_internal_charset()
> > is called at pty creation?  Or even on reading from /writing to
> > the pty the first time?  That should always be late enough to fetch
> > the correct codepage.
> > 
> > Patch attached.  Does that work as expected?
> 
> Thank you very much for the patch.
> 
> Your new additional patch works well except the test case such as:
> 
>   int pm = getpt();
>   if (fork()) {
>     [do the master operations]
>   } else {
>     int ps = open(ptsname(pm), O_RDWR|O_NOCTTY);
>     close(pm);
>     setsid();
>     ioctl(ps, TIOCSCTTY, 1);
>     dup2(ps, 0);
>     dup2(ps, 1);
>     dup2(ps, 2);
>     close(ps);
>     [exec non-cygwin process]
>   }
> 
> If this test case is run in cygwin console (command prompt),
> it causes garbled output due to term_code_page == 0.

term_code_page is set on fhandler_pty_slave::open, which is what
you call above.  How can term_code_page be 0 after that?  Are you
talking about the forking parent process being master?  If so, either
switching `#if 1/#if 0' blocks in the patch might fix this (by setting
term_code_page on first read/write), or alternatively adding an
__eval_codepage_from_internal_charset() call to the master creation
as well.  Did you try enabling the #if 0'd blocks?

Either way, the call to __eval_codepage_from_internal_charset() should
take place as soon as the first pty gets created or on first pty
read/write.

__eval_codepage_from_internal_charset() could also simply be called
on *every* invocation of pty read/write, given how low profile it is.

Apart from that, I don't see anything wrong with the above scenario.
If the application creating the pty does *not* switch locale, we're
in "C" locale territory, and we should default to UTF-8.  That's
what a call to __eval_codepage_from_internal_charset() would do,
because it defaults to UTF-8 and never returns the ASCII codepage.

> > Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread()
> > calls 
> > 
> >   char *buf = convert_mb_str (cygheap->locale.term_code_page,
> >                               &nlen, CP_UTF8, ptr, wlen);
> >                                      ^^^^^^^
> >   [...]
> >   WriteFile (to_master_cyg, ...
> > 
> > But then, after the code breaks from that loop, it calls
> > 
> >   char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
> >                               GetConsoleOutputCP (), ptr, wlen);
> >                               ^^^^^^^^^^^^^^^^^^^^^
> >   [...]
> >   process_opost_output (to_master_cyg, ...
> > 
> > process_opost_output then calls WriteFile on that to_master_cyg handle,
> > just like the WriteFile call above.
> > 
> > Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
> > well?
> 
> That is correct. The first conversion is for the case that pseudo
> console is enabled, and the second one is for the case that pseudo
> console is disabled.
> 
> Pseudo console converts charset from console code page to UTF-8.
> Therefore, data read from from_slave is always UTF-8 when pseudo
> console is enabled. Moreover, OPOST processing is done in pseudo
> console, so write data simply by WriteFile() is enough.
> 
> If pseudo console is disabled, cmd.exe and so on uses console
> code page, so the code page of data read from from_slave is
> GetConsoleOutputCP(). In this case, OPOST processing is necessary.

This is really confusing me.  We never set the console codepage in the
old pty code before, it was just pipes transmitting bytes.  Why do we
suddenly have to handle native apps running in a console in this case?!?

> diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h
> index 2b84f4252..8877cc358 100644
> --- a/winsup/cygwin/cygheap.h
> +++ b/winsup/cygwin/cygheap.h
> @@ -341,7 +341,6 @@ struct cygheap_debug
>  struct cygheap_locale
>  {
>    mbtowc_p mbtowc;
> -  UINT term_code_page;

No, wait.  Just reverting this change without checking the alternatives
doesn't make sense.

Why would we want to store the codepage in every single tty, given
term_code_page is set to the same value in every one of them?  AFAICS
it's sufficient to have a single term_code_page shared with the child
processes via cygheap.  The idea is to get rid of the complex
setup_locale code in every execve call and just set it once in a process
tree starting at the process creating the ptys.


Corinna

  parent reply	other threads:[~2020-09-07  8:26 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 16:19 Johannes Schindelin
2020-09-02  6:06 ` Johannes Schindelin
2020-09-02  8:30 ` Corinna Vinschen
2020-09-02  8:38   ` Corinna Vinschen
2020-09-02 10:54     ` Takashi Yano
2020-09-02 15:24       ` Corinna Vinschen
2020-09-02 16:09         ` Corinna Vinschen
2020-09-02 16:25         ` Takashi Yano
2020-09-02 16:38           ` Corinna Vinschen
2020-09-03 17:59             ` Corinna Vinschen
2020-09-04  9:21               ` Takashi Yano
2020-09-04 12:44                 ` Corinna Vinschen
2020-09-04 14:05                   ` Brian Inglis
2020-09-04 14:50                   ` Takashi Yano
2020-09-04 19:22                     ` Corinna Vinschen
2020-09-05  8:43                       ` Takashi Yano
2020-09-05 11:15                         ` Takashi Yano
2020-09-05 14:15                           ` Takashi Yano
2020-09-06  8:57                             ` Takashi Yano
2020-09-06 10:15                               ` Takashi Yano
2020-09-06 16:04                                 ` Takashi Yano
2020-09-07  4:45                                   ` Takashi Yano
2020-09-07  9:08                                     ` Corinna Vinschen
2020-09-07  9:54                                       ` Takashi Yano
2020-09-07  9:59                                         ` Takashi Yano
2020-09-08  8:40                                     ` Corinna Vinschen
2020-09-08  9:45                                       ` Takashi Yano
2020-09-08 19:16                                         ` Corinna Vinschen
2020-09-10 13:08                                         ` Takashi Yano
2020-09-07  8:27                           ` Corinna Vinschen
2020-09-07  8:38                             ` Takashi Yano
2020-09-07  9:09                               ` Corinna Vinschen
2020-09-07  8:26                         ` Corinna Vinschen [this message]
2020-09-07  9:36                           ` Takashi Yano
2020-09-07 18:24                             ` Takashi Yano
2020-09-07 21:08                             ` Johannes Schindelin
2020-09-08  4:52                               ` Brian Inglis
2020-09-07 10:27                           ` Takashi Yano
2020-09-07 13:40                             ` Takashi Yano
2020-09-08  7:55                               ` Corinna Vinschen
2020-09-06 10:28                   ` Takashi Yano
2020-09-07  8:33                     ` Corinna Vinschen
2020-09-02  9:41   ` Takashi Yano
2020-09-02  6:26     ` Johannes Schindelin
2020-09-02 13:06       ` Takashi Yano
2020-09-02  9:12         ` Johannes Schindelin
2020-09-02 14:52           ` Takashi Yano
2020-09-04 10:03 ` Takashi Yano
2020-09-04  6:23   ` Johannes Schindelin
2020-09-04 15:03     ` Takashi Yano
2020-09-07 21:17       ` Johannes Schindelin
2020-09-08  8:16         ` Takashi Yano
2020-09-09  7:21           ` Corinna Vinschen
2020-09-10  0:15             ` Takashi Yano
2020-09-10 12:34               ` Takashi Yano
2020-09-11  9:05                 ` Corinna Vinschen
2020-09-11  9:23                   ` Corinna Vinschen
2020-09-10 14:04               ` Corinna Vinschen
2020-09-10 14:16                 ` Takashi Yano
2020-09-10 14:18                   ` Takashi Yano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200907082633.GC4127@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin-patches@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).