From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by sourceware.org (Postfix) with ESMTPS id A507F3857823 for ; Mon, 31 Aug 2020 19:17:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A507F3857823 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gmx.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Johannes.Schindelin@gmx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1598901466; bh=cR2EvH66YYTJ1wNZHB0Wi05+oP29sJPs0e8cTREsJIg=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:References; b=ahMklwYNYzHc3Kj3+QjKMJZm0v/0KyjPJBRNlGStMK+nuGTKnyz6hcBfJMWNGszMv 0W00O+yOKvR22k3LUqmN6CjisE2ap4qrsPe0q7XH+WjsB5au8ei5LR62sNhazO8jCD C6+4rFEb9lxd0fdElMrqkxTvsMkb39DBuGNio2hQ= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [172.18.169.176] ([89.1.213.246]) by mail.gmx.com (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MLzBp-1jusv23mdd-00Hyag; Mon, 31 Aug 2020 21:17:46 +0200 Date: Mon, 31 Aug 2020 21:17:45 +0200 (CEST) From: Johannes Schindelin X-X-Sender: virtualbox@gitforwindows.org To: Thomas Wolff cc: cygwin-developers@cygwin.com Subject: Re: New implementation of pseudo console support (experimental) In-Reply-To: <1104c24d-49ea-96b9-30cb-acd4460108ab@towo.net> Message-ID: References: <20200722174541.c8113635236fd217cb9ebb77@nifty.ne.jp> <20200724143842.020cea9ffa2f6e7ffe760f67@nifty.ne.jp> <20200724202219.16ad238f515da19db21d3a6c@nifty.ne.jp> <20200803111103.27ef6554df7f40d1142bceee@nifty.ne.jp> <20200803212342.8b14a3164ed66bd521774fe4@nifty.ne.jp> <20200811201258.4bffb987ecdb96583c516bc2@nifty.ne.jp> <20200813185813.2d851113b6e134db371d35b4@nifty.ne.jp> <20200817205718.a0fcc08bf21be4ba5f10ba3d@nifty.ne.jp> <20200819203959.9d220306c58736f94381d1e6@nifty.ne.jp> <20200819134156.GP3272@calimero.vinschen.de> <20200820170210.e066c8ad933ca31061130ba9@nifty.ne.jp> <20200831231253.332c66fdddb33ceed5f61db6@nifty.ne.jp> <20200831235325.c26c1a75e4cec737e793c91c@nifty.ne.jp> <9f0e8248-cc3b-b5a8-0af5-43dbdf079478@towo.net> <1104c24d-49ea-96b9-30cb-acd4460108ab@towo.net> User-Agent: Alpine 2.21.1 (DEB 209 2017-03-23) MIME-Version: 1.0 X-Provags-ID: V03:K1:v6G4ubnXdGzK6EN71Tl9vwa8TdHAAw6VqwU4fdZXTB7jC2Jpk22 i7+V8VJajXf2MMFmnjApQ6hqDQqt4UlrhstCk8LMy2cPFZDENC1uzJoaQaswS8gU2ZH9lLo gjBL3m2iG9osTdwCR+GP0D9QWrvxhaL/5SBaVf8dEJpBkStEkmyig9iuX8tXaxlfFhim1df gnW2FBkHOio1RPUZ49IPw== X-UI-Out-Filterresults: notjunk:1;V03:K0:EGVz1RM+LJY=:gaIq193Jwypz5iwv1dcHlT hJ0G/vXX5vLm/knBOoakMHf7MMyhnAkGFHbkIFW+jiHRZhUECFBnzZM03EwGs/YRd5HYzxhtw QmzIPCIonGEgcIPAyGrpMJjuvHXYzXHYhmvkbq/g7zOKOkZJVPJunazE9aInI+Wxn2qUQmBN9 cYuUrWWdToOYUPZqzmR4UHu07dHnU3X+oYK3qlz0YxUW7qosWzB2t2R0psWNvYggC9AeBxgW8 44H8BODQr5clZaUNRr2l2C/1NooP2pLQulQUTDXM5jDhNL0qyToBbr9VwyuMaCHjWt00UO6N4 d/1WhsW7Oe3awthNMnAPH3CdQUhVALqr9jldDcz8IxRNgQNHQJe8EFVkSUCYmG2dkFB+pRDA1 rhOWqXsEavVgrKT9/gkuOAOeqyKcnUUaBLFcoH0rMU4DmJ++kBriDL6kQxBEA2bYplJ1pX/w6 6UhuCoNSwjb2k7H1i1mjO035QEjhyTVX02an48rhCszHCoQyEMuHUcBQlmCTRwIgVz7eEg5PI ji4HPEU91NhnQ76My9qOf8pyKvQGAJ9R0AwW5BWVWTwMT5yn13rbMXDCJMaoTa60abpYDdWBD Nz0Bfp0hjlfdWTeG/KGXEGeOn7T6yB0OiHD0VvMKzeLfJ/G8QXsow7qkQGjD0zvr3bSaVUNkN HksyZg9xL/hSWylQtYk7Aaq1wD0Y7bXWd4F9JfVsi9WtZ+riRUNzMt03hmjybirCYLNiIO9XU 46gjO+GbWOO63FmSWwH/G+eMeG02JIHetCpWkiAiqEy0DKZl6qNtG1iykDi2Qd9OSMZCqKBJg 9XVjh3Q4k4dS2dBw8VQPGmNRUAl5m/8Z0blqO8RFjlutNMV+BiC+T5C9WTfICwQSO3aK9jd8u z7DZu9SwWelbppdqO5pAVNKtrinq/w9CDkRK/uhj8bkh3K+awTghhlZKz0wgohKuLjQEVP4Gh x/940q1AWEszCIEDiZpy8eITyKsK0puP0NzVNKsJpXk3RCxPjnqy2jEA0Cq9bhxE8+QopRSeX bNyF1lq11lFQAIx9c3vUfGxsxA0EdudWJJGo5N7odIFk6LaphFZ+1HJhYMLvCph8PUzmwIbLH mCDMNuXQznBbC1oJT0R/6ceha8WP1Yqqcy5jiJVRsn3fZgU2UesD91MI8AN+WxnxBkB78Y0Oz kL9mGjSIfz05SwC51kQXPQeZT43wEnjfxWNM2d4VxAgyaoP7slAc66L/MwSfmh28hN4OtU/rn Jf+VtnQNXsuy56TZuwb/OppugMTMiT5NaIJqSUQ== X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2020 19:17:51 -0000 Hi Thomas, On Mon, 31 Aug 2020, Thomas Wolff wrote: > Am 31.08.2020 um 18:12 schrieb Thomas Wolff: > > > > Am 31.08.2020 um 17:56 schrieb Johannes Schindelin: > > > > > On Mon, 31 Aug 2020, Takashi Yano wrote: > > > > > > > On Mon, 31 Aug 2020 16:22:20 +0200 (CEST) > > > > Johannes Schindelin wrote: > > > > > On Mon, 31 Aug 2020, Takashi Yano wrote: > > > > > > > > > > > On Mon, 31 Aug 2020 14:49:04 +0200 (CEST) > > > > > > Johannes Schindelin wrote: > > > > > > > > > > > > > Sorry to latch onto this thread with something slightly > > > > > > > different, but we do see pretty serious encoding problems > > > > > > > (both with and without `CYGWIN=3Ddisable_pcon`) in the Git f= or > > > > > > > Windows and the MSYS2 projects. For example, in > > > > > > > https://github.com/msys2/MSYS2-packages/issues/1974 the > > > > > > > following issue was reported. If you compile a _MINGW_ > > > > > > > program from this source code: > > > > > > > > > > > > > > -- snip -- > > > > > > > #include > > > > > > > > > > > > > > int main(){ > > > > > > > =C2=A0=C2=A0 puts("=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82 =D0= =BC=D0=B8=D1=80! Hello world!"); > > > > > > > =C2=A0=C2=A0 return 0; > > > > > > > } > > > > > > > -- snap -- > > > > > > > > > > > > > > and then execute it, you will see this output: > > > > > > > > > > > > > > -- snip -- > > > > > > > =E2=95=A8=C6=92=E2=95=A4=C3=87=E2=95=A8=E2=95=95=E2=95=A8=E2= =96=93=E2=95=A8=E2=95=A1=E2=95=A4=C3=A9 =E2=95=A8=E2=95=9D=E2=95=A8=E2=95= =95=E2=95=A4=C3=87! Hello world! > > > > > > > -- snap -- > > > > > > > > > > > > I guess your program (binary exe) does not work as you expect > > > > > > in command prompt as well. If you want to use UTF-8 coding in > > > > > > output, you should add SetConsoleOutputCP(CP_UTF8) call befere > > > > > > puts(). > > > > > > > > > > That may be, but I would like to point out that the very same > > > > > executable worked quite well in a MinTTY using v3.0.7... > > > > Assuming the test program source file is encoded in UTF-8 when > > compiling with x86_64-w64-mingw32-gcc, the string would be output byte > > by byte, which happend to be interpreted in UTF-8 when run in a > > terminal on cygwin 3.0.7, although the program was not set up to use > > UTF-8. The "correct" output was actually buggy behaviour, so current > > cygwin has "fixed" it, to your disadvantage in this case. With ConPTY > > support, matching encoding on Windows and terminal side need to be > > taken care of. > > My wording was misleading. Maybe it's proper to say it this way: > Matching encoding on each side between application and respective system > is needed, as ConPTY transforms encoding properly on system level. Well, I just wonder how your wording (misleading or not) relates to the issue at hand: there are programs out there that simply do not take care of calling `SetConsoleOutputCP()`. What you are telling me is that those programs are wrong, which I can kind of get behind. However, what I do not understand is what you argue should happen with the output of such programs (if you address that concern at all, which I am not really sure of). Previously, we assumed the output to be in UTF-8 (although I frankly have no idea how that worked). Starting with v3.1.0 (or at least v3.1.4, I have not _really_ verified with earlier versions), the output is assumed to use code page 437. With seemingly everybody and their sister switching to UTF-8, I wonder whether that even makes sense. So I had a look at the code, and it seems that `fhandler_pty_slave::setup_locale()` forces the output encoding to C.ASCII if Pseudo Console support is enabled: char locale[ENCODING_LEN + 1] =3D "C"; char charset[ENCODING_LEN + 1] =3D "ASCII"; LCID lcid =3D get_langinfo (locale, charset); /* Set console code page from locale */ if (get_pseudo_console ()) { UINT code_page; if (lcid =3D=3D 0 || lcid =3D=3D (LCID) -1) code_page =3D 20127; /* ASCII */ else if (!GetLocaleInfo (lcid, LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUM= BER, (char *) &code_page, sizeof (code_page))) code_page =3D 20127; /* ASCII */ SetConsoleCP (code_page); SetConsoleOutputCP (code_page); } Please note that this essentially forces the console output code page to ASCII (in my case, the fall-back to 20127 seems not to kick in, but 437 is used instead, as LCID x0409 is used). However, there is no overriding call to `SetConsoleOutputCP()` later in that method, not even when the `charset` is correctly identified as `UTF-8` (because my `LANG=3Den_US.UTF-8`). Now, what I _really_ do not understand is why Cygwin insists on using the console output code page when running in `CYGWIN=3Ddisable_pcon` mode... Otherwise, this patch would be enough to fix it for me: =2D- snip -- diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 43eebc174..2ce8dae9a 100644 =2D-- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -2867,11 +2867,13 @@ fhandler_pty_slave::setup_locale (void) char charset[ENCODING_LEN + 1] =3D "ASCII"; LCID lcid =3D get_langinfo (locale, charset); - /* Set console code page form locale */ + /* Set console code page from locale */ if (get_pseudo_console ()) { UINT code_page; - if (lcid =3D=3D 0 || lcid =3D=3D (LCID) -1) + if (!strcasecmp (charset, "utf-8")) + code_page =3D CP_UTF8; + else if (lcid =3D=3D 0 || lcid =3D=3D (LCID) -1) code_page =3D 20127; /* ASCII */ else if (!GetLocaleInfo (lcid, LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN_NUMBER, =2D- snap -- But that does _not_ reinstate the previous behavior when Pseudo Console support is disabled. Now, I would call that a regression (the entire idea of `disable_pcon` was to fall back to the previous behavior, no?). And I do not really understand where it comes from, that regression. Where does the code path differ from the previous one when Pseudo Console support is disabled, and how does that relate to the current console output code page? Ciao, Johannes > > Thomas > > > > > > at the expense of garbled output for apps which use native > > > > code page of the system in the correct maner. > > > Are you referring to apps that call the SetConsoleOutputCP() functio= n? If > > > so, I am asking myself what would be broken. Because apps that do _n= ot_ > > > call that function (expecting UTF-8 to be active) would be fixed, wh= ile > > > apps that _do_ call that function would not care if the Cygwin runti= me > > > changed it. > > > > > > Ciao, > > > Johannes > > > >