public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Thomas Wolff <towo@towo.net>
Cc: cygwin-developers@cygwin.com
Subject: Re: New implementation of pseudo console support (experimental)
Date: Mon, 31 Aug 2020 21:17:45 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2008312008560.56@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <1104c24d-49ea-96b9-30cb-acd4460108ab@towo.net>

Hi Thomas,

On Mon, 31 Aug 2020, Thomas Wolff wrote:

> Am 31.08.2020 um 18:12 schrieb Thomas Wolff:
> >
> > Am 31.08.2020 um 17:56 schrieb Johannes Schindelin:
> >
> > > On Mon, 31 Aug 2020, Takashi Yano wrote:
> > >
> > > > On Mon, 31 Aug 2020 16:22:20 +0200 (CEST)
> > > > Johannes Schindelin wrote:
> > > > > On Mon, 31 Aug 2020, Takashi Yano wrote:
> > > > >
> > > > > > On Mon, 31 Aug 2020 14:49:04 +0200 (CEST)
> > > > > > Johannes Schindelin wrote:
> > > > > >
> > > > > > > Sorry to latch onto this thread with something slightly
> > > > > > > different, but we do see pretty serious encoding problems
> > > > > > > (both with and without `CYGWIN=disable_pcon`) in the Git for
> > > > > > > Windows and the MSYS2 projects. For example, in
> > > > > > > https://github.com/msys2/MSYS2-packages/issues/1974 the
> > > > > > > following issue was reported. If you compile a _MINGW_
> > > > > > > program from this source code:
> > > > > > >
> > > > > > > -- snip --
> > > > > > > #include <stdio.h>
> > > > > > >
> > > > > > > int main(){
> > > > > > >    puts("Привет мир! Hello world!");
> > > > > > >    return 0;
> > > > > > > }
> > > > > > > -- snap --
> > > > > > >
> > > > > > > and then execute it, you will see this output:
> > > > > > >
> > > > > > > -- snip --
> > > > > > > Привет мир! Hello world!
> > > > > > > -- snap --
> > > > > >
> > > > > > I guess your program (binary exe) does not work as you expect
> > > > > > in command prompt as well. If you want to use UTF-8 coding in
> > > > > > output, you should add SetConsoleOutputCP(CP_UTF8) call befere
> > > > > > puts().
> > > > >
> > > > > That may be, but I would like to point out that the very same
> > > > > executable worked quite well in a MinTTY using v3.0.7...
> >
> > Assuming the test program source file is encoded in UTF-8 when
> > compiling with x86_64-w64-mingw32-gcc, the string would be output byte
> > by byte, which happend to be interpreted in UTF-8 when run in a
> > terminal on cygwin 3.0.7, although the program was not set up to use
> > UTF-8. The "correct" output was actually buggy behaviour, so current
> > cygwin has "fixed" it, to your disadvantage in this case. With ConPTY
> > support, matching encoding on Windows and terminal side need to be
> > taken care of.
>
> My wording was misleading. Maybe it's proper to say it this way:
> Matching encoding on each side between application and respective system
> is needed, as ConPTY transforms encoding properly on system level.

Well, I just wonder how your wording (misleading or not) relates to the
issue at hand: there are programs out there that simply do not take care
of calling `SetConsoleOutputCP()`.

What you are telling me is that those programs are wrong, which I can kind
of get behind.

However, what I do not understand is what you argue should happen with the
output of such programs (if you address that concern at all, which I am
not really sure of).

Previously, we assumed the output to be in UTF-8 (although I frankly have
no idea how that worked). Starting with v3.1.0 (or at least v3.1.4, I have
not _really_ verified with earlier versions), the output is assumed to use
code page 437.

With seemingly everybody and their sister switching to UTF-8, I wonder
whether that even makes sense.

So I had a look at the code, and it seems that
`fhandler_pty_slave::setup_locale()` forces the output encoding to
C.ASCII if Pseudo Console support is enabled:

  char locale[ENCODING_LEN + 1] = "C";
  char charset[ENCODING_LEN + 1] = "ASCII";
  LCID lcid = get_langinfo (locale, charset);

  /* Set console code page from locale */
  if (get_pseudo_console ())
    {
      UINT code_page;
      if (lcid == 0 || lcid == (LCID) -1)
        code_page = 20127; /* ASCII */
      else if (!GetLocaleInfo (lcid,
                               LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUMBER,
                               (char *) &code_page, sizeof (code_page)))
        code_page = 20127; /* ASCII */
      SetConsoleCP (code_page);
      SetConsoleOutputCP (code_page);
    }

Please note that this essentially forces the console output code page to
ASCII (in my case, the fall-back to 20127 seems not to kick in, but 437 is
used instead, as LCID x0409 is used).

However, there is no overriding call to `SetConsoleOutputCP()` later in
that method, not even when the `charset` is correctly identified as
`UTF-8` (because my `LANG=en_US.UTF-8`).

Now, what I _really_ do not understand is why Cygwin insists on using the
console output code page when running in `CYGWIN=disable_pcon` mode...

Otherwise, this patch would be enough to fix it for me:

-- snip --
diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 43eebc174..2ce8dae9a 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -2867,11 +2867,13 @@ fhandler_pty_slave::setup_locale (void)
   char charset[ENCODING_LEN + 1] = "ASCII";
   LCID lcid = get_langinfo (locale, charset);

-  /* Set console code page form locale */
+  /* Set console code page from locale */
   if (get_pseudo_console ())
     {
       UINT code_page;
-      if (lcid == 0 || lcid == (LCID) -1)
+      if (!strcasecmp (charset, "utf-8"))
+	code_page = CP_UTF8;
+      else if (lcid == 0 || lcid == (LCID) -1)
 	code_page = 20127; /* ASCII */
       else if (!GetLocaleInfo (lcid,
 			       LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUMBER,
-- snap --

But that does _not_ reinstate the previous behavior when Pseudo Console
support is disabled.

Now, I would call that a regression (the entire idea of `disable_pcon` was
to fall back to the previous behavior, no?). And I do not really
understand where it comes from, that regression. Where does the code path
differ from the previous one when Pseudo Console support is disabled, and
how does that relate to the current console output code page?

Ciao,
Johannes

> > Thomas
> >
> > > > at the expense of garbled output for apps which use native
> > > > code page of the system in the correct maner.
> > > Are you referring to apps that call the SetConsoleOutputCP() function? If
> > > so, I am asking myself what would be broken. Because apps that do _not_
> > > call that function (expecting UTF-8 to be active) would be fixed, while
> > > apps that _do_ call that function would not care if the Cygwin runtime
> > > changed it.
> > >
> > > Ciao,
> > > Johannes
> >
>
>

  reply	other threads:[~2020-08-31 19:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 12:16 Takashi Yano
2020-05-13 12:35 ` Thomas Wolff
2020-05-14  9:28 ` Takashi Yano
2020-05-14  9:34   ` Takashi Yano
2020-05-16  0:29     ` Takashi Yano
2020-05-16  7:47       ` Takashi Yano
2020-05-19 13:40         ` Takashi Yano
2020-05-25 10:53           ` Takashi Yano
2020-05-25 15:22             ` Corinna Vinschen
2020-05-25 19:16               ` Thomas Wolff
2020-05-26  1:00               ` Takashi Yano
2020-05-26  7:14                 ` Thomas Wolff
2020-05-26  9:21                   ` Takashi Yano
2020-05-26  9:32                     ` Thomas Wolff
2020-05-26  8:33                 ` Corinna Vinschen
2020-05-26  1:09             ` Takashi Yano
2020-05-28 15:40               ` Takashi Yano
2020-05-29 15:30                 ` Corinna Vinschen
2020-05-30  7:36                   ` Takashi Yano
2020-05-30 13:14                     ` Takashi Yano
2020-05-30 17:43                       ` Corinna Vinschen
2020-05-31  5:52                         ` Takashi Yano
2020-07-01 11:47                 ` Takashi Yano
2020-07-17 11:19                   ` Corinna Vinschen
2020-07-17 12:47                     ` Thomas Wolff
2020-07-17 14:59                       ` Thomas Wolff
2020-07-18  5:05                         ` Takashi Yano
2020-07-18 20:57                           ` Thomas Wolff
2020-07-23 17:17                             ` Takashi Yano
2020-07-27 17:10                               ` Thomas Wolff
2020-07-17 12:52                     ` Ken Brown
2020-07-18  5:07                       ` Takashi Yano
2020-07-18  5:30                     ` Takashi Yano
2020-07-20  8:06                       ` Corinna Vinschen
2020-07-21 18:17                         ` Takashi Yano
2020-07-22  8:45                           ` Takashi Yano
2020-07-22 11:49                             ` Corinna Vinschen
2020-07-22 12:13                               ` Ken Brown
2020-07-23  0:33                             ` Takashi Yano
2020-07-24  5:38                               ` Takashi Yano
2020-07-24 11:22                                 ` Takashi Yano
2020-08-02 12:01                                   ` Corinna Vinschen
2020-08-03  2:05                                     ` Takashi Yano
2020-08-03 10:50                                       ` Corinna Vinschen
2020-08-03  2:11                                   ` Takashi Yano
2020-08-03 12:23                                     ` Takashi Yano
2020-08-11 11:12                                       ` Takashi Yano
2020-08-13  9:58                                         ` Takashi Yano
2020-08-17 11:57                                           ` Takashi Yano
2020-08-19 11:39                                             ` Takashi Yano
2020-08-19 13:41                                               ` Corinna Vinschen
2020-08-19 15:43                                                 ` Thomas Wolff
2020-08-19 20:47                                                 ` Mark Geisert
2020-08-20  8:02                                                 ` Takashi Yano
2020-08-31 12:49                                                   ` Johannes Schindelin
2020-08-31 14:14                                                     ` Takashi Yano
     [not found]                                                     ` <20200831231253.332c66fdddb33ceed5f61db6@nifty.ne.jp>
2020-08-31 14:22                                                       ` Johannes Schindelin
2020-08-31 14:53                                                         ` Takashi Yano
2020-08-31 15:56                                                           ` Johannes Schindelin
2020-08-31 16:12                                                             ` Thomas Wolff
2020-08-31 17:39                                                               ` Thomas Wolff
2020-08-31 19:17                                                                 ` Johannes Schindelin [this message]
2020-08-31 19:37                                                                   ` Corinna Vinschen
2020-09-01  4:46                                                                     ` Johannes Schindelin
2020-09-01  9:23                                                                       ` Takashi Yano
2020-09-01  6:32                                                                         ` Johannes Schindelin
2020-09-01 22:33                                                                           ` Takashi Yano
2020-09-02  6:13                                                                             ` Johannes Schindelin
2020-09-01  9:42                                                                         ` Takashi Yano
2020-08-31 21:07                                                                   ` Thomas Wolff
2020-08-31 23:23                                                                     ` Takashi Yano
2020-09-01  5:00                                                                     ` Johannes Schindelin
2020-09-01  8:56                                                                       ` Thomas Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2008312008560.56@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=cygwin-developers@cygwin.com \
    --cc=towo@towo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).