Hi Takashi, On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Thu, 3 Sep 2020 19:59:12 +0200 > Corinna Vinschen wrote: > > The only idea I had so far was, changing the way __set_charset_from_locale > > works from within _setlocale_r: > > > > We could add a Cygwin-specific function only fetching the codepage and > > call it unconditionally from _setlocale_r. __set_charset_from_locale is > > called with a new parameter "codepage", so it doesn't have to fetch the > > CP by itself, but it's still only called from _setlocale_r if necessary. > > > > Would that be sufficient? The CP conversion from 20127/ASCII to 65001/UTF8 > > could be done at the point the codepage is actually required. > > I think I have found the answer to your request. > Patch attached. What do you think of this patch? > > Calling initial_setlocale() is necessary because > nl_langinfo() always returns "ANSI_X3.4-1968" > regardless locale setting if this is not called. Well, this is correct. Per POSIX, a standard-conformant application is running in the "C" locale unless it calls setlocale() explicitely. That's one reason Cygwin defaults to UTF-8 internally. Everything, including the terminal, is supposed to default to UTF-8. That's the most sane default, even if an application is not locale-aware. So, to follow POSIX, initial_setlocale() is used to set up the environment and command line stuff and then, before calling the application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C"); to reset the apps default locale to "C". That's why nl_langinfo() returns "ANSI_X3.4-1968". However, the initial_setlocale() call in dll_crt0_1 calls internal_setlocale(), and *that* function sets the conversion functions for the internal conversions. What it *doesn't* do yet at the moment is to store the charset name itself or, better, the equivalent codepage. If we change that, setup_locale can simply go away. Below is a patch doing just that. Can you please check if that works in your test scenarios? However, there's something which worries me. Why do we need or set the pseudo terminal codepage at all? I see that you convert from MB charset to MB charset and then use the result in WriteFile to the connecting pipes. Question is this: Why not just converting the strings via sys_mbstowcs to a UTF-16 string and then send that over the line, using WriteConsoleW for the final output to the console? That would simplify this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to the console, we simply don't need to know or care for the console CP. Thanks, Corinna