From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by sourceware.org (Postfix) with ESMTPS id E63A2385EC4E; Tue, 1 Sep 2020 08:18:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E63A2385EC4E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gmx.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Johannes.Schindelin@gmx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1598948284; bh=MV9uDj1kCWi9jQUd4wt+Qr5bUIvphg08pw4ciHc1E+8=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:References; b=fEF63H62HJOmNOdH2rotE3VSINe6LyTfaEd34BsRCzg5FoxL1W9/eHrvjKCZQgiYz Kk3r/hwzGr+WG1ejK5wkZcImr41MlI7AFnkn/S4d6rSEjD8/TE69u4BZ+ifDaKzg5+ aQPtCp9/CP2VwnDhXpgiaAmVwF7I9No7xu7YUXUc= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [172.18.169.176] ([89.1.214.118]) by mail.gmx.com (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MUGi9-1k434x1muR-00RJfE; Tue, 01 Sep 2020 10:18:04 +0200 Date: Tue, 1 Sep 2020 06:46:53 +0200 (CEST) From: Johannes Schindelin X-X-Sender: virtualbox@gitforwindows.org To: cygwin-developers@cygwin.com cc: Corinna Vinschen Subject: Re: New implementation of pseudo console support (experimental) In-Reply-To: <20200831193736.GG3272@calimero.vinschen.de> Message-ID: References: <20200819134156.GP3272@calimero.vinschen.de> <20200820170210.e066c8ad933ca31061130ba9@nifty.ne.jp> <20200831231253.332c66fdddb33ceed5f61db6@nifty.ne.jp> <20200831235325.c26c1a75e4cec737e793c91c@nifty.ne.jp> <9f0e8248-cc3b-b5a8-0af5-43dbdf079478@towo.net> <1104c24d-49ea-96b9-30cb-acd4460108ab@towo.net> <20200831193736.GG3272@calimero.vinschen.de> User-Agent: Alpine 2.21.1 (DEB 209 2017-03-23) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Provags-ID: V03:K1:45lNv43wTKHl/+fxLB/uXjzKCqmtHQTpX7O9wypOPD25fZGOfxr o8ewnLSlpeyy3/RLoTNu68ime7WuE3mcu6ZhCu5eLuAaT1YjdfEXLiUkrwwOLk+J0k6QO89 +Quy97KIhmZBWP/rRrlbti6lVX4gSMd/Th0QDkc6qL0feoBWD/V1X478sL0s7bv815XWbXf RA7RPVj023gzM48QtUDYQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:wZeKlkNesxc=:ofrKeB6FBfT4hkMQtRXbeI lJYEmVc8A3iWBCLAnZA3M04uJgCGUUUevaZdQ72jIKQWl4WVU0Q6AcCt4KvzuP2diBP8JOG3Q bIIBq0cPLMbi06SPkDVYtBCj2FZZwpC0dYuvCOxqoNnLOB19Nmmc4ZlSKQFwkFAMi0HvJ7oDx /xitSlpwcA7sV4HCOSF6GUU7IIFAIIH370/4UcXOBKfghGKb89ojHwWxRu+cvx6pSrs1kO938 Begw1z1zarq81FqMOQmhB33ZyuTnFT4NTxEAVavIrJHxOXSXWywZU3xEUoxLpk84F2UaJCKVm Ld3WBG17WhmS1J4xC5YdRYZZRp3wksv3RjNJvgslZ+qnhZqkgQbwMYZ61wmEP/xEaKeL14Iq7 zEF9zcKMZyXmvQ+5XZ7uCqnDYFsmBQLxoBeSz8b8K+6hoIZCzjvp6himjGOjSiwAbFpem6yyu uQ9fcp73bXFjxymFyQtwMGHzm3fE0CHuWC6OKd9dNDv2VXhq2GFz417RSLQb30LkxybmobJeT qQx/3S+iNT5LEmI8R0sgwoTLwmUgKqGnZK8mj6GHIKPYxg687i2dG4y5KEBOMMMFQRXlYqjhy YgGCkuzn65TF9p3lXOUmLlpf6Adw4hLPAIe5sqVbuRHRN2QhsXIFP7pFSGIJLaVHTC2jKityH wJXADkepeMGzva5MsVDbHwbG0WRnhYR7u8QmTChZtuQP6Zj00fJxAIGX59wJ+BaXExPGjHPUl fJLd0TmcAR4FldoZXFnH709hmbEWUjw8muiMR2IARzPjNSOPqHrS1v5TR/IEV48O8PXW8TuVR EbMOYQZJgJsivlKwilb+kPNAABvXz4NR+wwX9kCDZ+96nWVvMjFxqRUAOKYY02homH50i3ury XrhZ0r4sifKbNNv0P+hgd5vCSQlaMDn3Vww4naDwr5tiUWbXpMnJmOPA285EUuf1wv6XDay2t vBgFzs1Z8a0EcgWXlHxgLq7D1uYQuorOlxmmj97Id7kn5G93sJRYbXx+MFu0N9co7svg8hnCM Txh0tIFYjPEmJVgOSHSSFMUbUsqiBhu+J2HAs/P9KJH90xq0pCvl3NHeY5gud/0bvF5PvHmhJ kJc+CFF1svG6p03b35KnBsy0/uXpYae0JOC6MhWNlPNLrWfj0MsYAXTxd93NHNeyu9ZjhRh/G uC9HZcqnof35/a+mIFU9xNNgJkgw3nPjGKiA5JcOXwlfIKgR2RtyJn+V4mpIklDTCbI5VjDBq VG655X6I6rGmsmTrcWxUzwwxh3B/aF8zw6PviOA== Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, DATE_IN_PAST_03_06, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Sep 2020 08:18:09 -0000 Hi Corinna, On Mon, 31 Aug 2020, Corinna Vinschen wrote: > On Aug 31 21:17, Johannes Schindelin wrote: > > [...] > > So I had a look at the code, and it seems that > > `fhandler_pty_slave::setup_locale()` forces the output encoding to > > C.ASCII if Pseudo Console support is enabled: > > > > char locale[ENCODING_LEN + 1] =3D "C"; > > char charset[ENCODING_LEN + 1] =3D "ASCII"; > > LCID lcid =3D get_langinfo (locale, charset); > > > > /* Set console code page from locale */ > > if (get_pseudo_console ()) > > { > > UINT code_page; > > if (lcid =3D=3D 0 || lcid =3D=3D (LCID) -1) > > code_page =3D 20127; /* ASCII */ > > This looks wrong, actually. The default behaviour of Cygwin since > Cygwin 1.7 was to assume UTF-8, even if the application doesn't call > setlocale. This means the locale is "C", so ASCII is expected. > However, even in this case, the internal conversions use UTF-8. > See function internal_setlocale() in nlsfuncs.cc, lines 1553/1554. > > We never switched the console codepage, though, because the codepage > doesn't make much sense when using wide character functions only, > i. e. WriteConsoleW. Only the alternate charset is 437/ASCII. So, > if the pseudo console actually *requires* to set the charset... Well, it is worse, as I have reported elsewhere in this thread. For some reason (which was not answered yet, and which I am still very much interested in knowing), the Console output code page is _still_ used in `disable_pcon`. That smells completely wrong. Why would the actual Console output encoding be involved when Pseudo Console support is disabled, when it was not at all used in v3.0.7 (which is supposedly using the same code paths that `disable_pcon` is still expected to use)? > > > > else if (!GetLocaleInfo (lcid, > > LOCALE_IDEFAULTCODEPAGE | LOCALE_RETURN= _NUMBER, > > (char *) &code_page, sizeof (code_page)= )) > > code_page =3D 20127; /* ASCII */ > > SetConsoleCP (code_page); > > SetConsoleOutputCP (code_page); > > can we please default to UTF-8 here even if the code page is ASCII? Yes, please. In fact, I am tempted to do this: =2D- snip -- diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 43eebc174..65b4d45fa 100644 =2D-- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -2867,7 +2867,16 @@ fhandler_pty_slave::setup_locale (void) char charset[ENCODING_LEN + 1] =3D "ASCII"; LCID lcid =3D get_langinfo (locale, charset); - /* Set console code page form locale */ + /* Special-case the UTF-8 character set */ + if (strcasecmp (charset, "UTF-8") =3D=3D 0) + { + get_ttyp ()->term_code_page =3D CP_UTF8; + SetConsoleCP (CP_UTF8); + SetConsoleOutputCP (CP_UTF8); + return; + } + + /* Set console code page from locale */ if (get_pseudo_console ()) { UINT code_page; =2D- snap -- The main reason why I am hesitating is that I smell a bigger problem here: the mere fact that a code path that is supposed not to use Console functions at all (`disable_pcon`) _does_ respect the output code page indicates to me that that code path was changed in a totally unintended way. Ciao, Johannes