Hi Takashi, On Sep 4 23:50, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Fri, 4 Sep 2020 14:44:00 +0200 > Corinna Vinschen wrote: > > On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: > > > I think I have found the answer to your request. > > > Patch attached. What do you think of this patch? > > > > > > Calling initial_setlocale() is necessary because > > > nl_langinfo() always returns "ANSI_X3.4-1968" > > > regardless locale setting if this is not called. > > [...] > > However, the initial_setlocale() call in dll_crt0_1 calls > > internal_setlocale(), and *that* function sets the conversion functions > > for the internal conversions. What it *doesn't* do yet at the moment is > > to store the charset name itself or, better, the equivalent codepage. > > > > If we change that, setup_locale can simply go away. Below is a patch > > doing just that. Can you please check if that works in your test > > scenarios? > > I tried your patch, but unfortunately it does not work. > cygheap->locale.term_code_page is 0 in pty master. > > If the following lines are moved in internal_setlocale(), > > const char *charset = __locale_charset (__get_global_locale ()); > debug_printf ("Global charset set to %s", charset); > /* Store codepage to be utilized by pseudo console code. */ > cygheap->locale.term_code_page = > __eval_codepage_from_internal_charset (charset); > > in internal_setlocale() before > > /* Don't do anything if the charset hasn't actually changed. */ > if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc) > return; Uh, that makes sense. > cygheap->locale.term_code_page is always 65001 even if mintty is > startted by > mintty -o locale=ja_JP -o charset=CP932 > or > mintty -o locale=ja_JP -o charset=EUCJP > > Perhaps, this is because LANG is not set properly yet when mintty > is started. Yeah, that's the reason. The above settings of locale and charset on the CLI should only take over when mintty calls setlocale() with a matching string. The fact that it sets the matching value in the environment, too, should only affect child processes, not mintty itself. But it's incorrect to call initial_setlocale() from setup_locale() without resetting it to its original value. Unfortunately that doesn't solve any problem with the pseudo console codepage. Drat. It sounds like you need the terminal's charset, rather than the one set in the environment. So this boils down to the fact that term_code_page must be set after the application is already running and as soo as it creates the pty, me thinks. What if __eval_codepage_from_internal_charset() is called at pty creation? Or even on reading from /writing to the pty the first time? That should always be late enough to fetch the correct codepage. Patch attached. Does that work as expected? > > However, there's something which worries me. Why do we need or set the > > pseudo terminal codepage at all? I see that you convert from MB charset > > to MB charset and then use the result in WriteFile to the connecting > > pipes. Question is this: Why not just converting the strings via > > sys_mbstowcs to a UTF-16 string and then send that over the line, using > > WriteConsoleW for the final output to the console? That would simplify > > this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to > > the console, we simply don't need to know or care for the console CP. > > WriteConsoleW() cannot be used because the handle to_master_cyg is > not a console handle. Sigh, of course. It's all just pipes. I forgot that, sorry. Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread() calls char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen, CP_UTF8, ptr, wlen); ^^^^^^^ [...] WriteFile (to_master_cyg, ... But then, after the code breaks from that loop, it calls char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen, GetConsoleOutputCP (), ptr, wlen); ^^^^^^^^^^^^^^^^^^^^^ [...] process_opost_output (to_master_cyg, ... process_opost_output then calls WriteFile on that to_master_cyg handle, just like the WriteFile call above. Is that really correct? Shouldn't the second invocation use CP_UTF8 as well? Corinna