public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: cygwin@cygwin.com
Subject: Re: Trouble with character sets
Date: Mon, 3 Aug 2020 10:31:15 -0600	[thread overview]
Message-ID: <ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca> (raw)
In-Reply-To: <OF3F4D2646.3A75682C-ON852585B5.0058983D-852585B9.0055B758@abinitio.com>

On 2020-08-03 09:36, Michael Shay via Cygwin wrote:
> I'm having a problem with Cygwin 3.1.4, changing the character set on the 
> fly. It seems to work with Cygwin applications, but not with Win32 
> applications.
> I have a Korn shell script:
> #!/bin/ksh
> OLD_LANG="$LANG"
> OLD_LC_ALL="$LC_ALL"
> echo "locale on entry"
> locale
> echo ""
> export LANG="en_US.CP1252"
> export LC_ALL=en_US.CP1252
> echo "locale changed to"
> locale
> echo ""
> # Default is to run the Win32 program. Input any argument other than 
> 'WIN32'
> # to run '/bin/echo'.
> case $# in
>    0 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZÇ'
>         ;;
>    1 )  echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZÇ'
>         ;;
>    2 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZÇ'
>         echo ""
>         echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZÇ'
>         ;;
>    * ) ;;
> esac
> LC_ALL="$OLD_LC_ALL"
> LANG="$OLD_LANG"
> and a Win32 application (attached file cygtest.cpp)
> I used gdb to see what was happening in child_info_spawn::worker(), when a 
> Win32 program is started using:
>           rc = CreateProcessW (runpath,   /* image name w/ full path */
>                    cmd.wcs (wcmd),  /* what was passed to exec */
>                    sa,    /* process security attrs */
>                    sa,    /* thread security attrs */
>                    TRUE,    /* inherit handles */
>                    c_flags,
>                    envblock,  /* environment */
>                    NULL,
>                    &si,
>                    &pi);
> Specifically, 'cmd.wcs(wcmd)' invokes:
>   wchar_t *wcs (wchar_t *wbuf, size_t n)
>   {
>     if (n == 1)
>       wbuf[0] = L'\0';
>     else
>         sys_mbstowcs (wbuf, n, buf);
>     return wbuf;
>   }
> and sys_mbstowcs():
> size_t __reg3
> sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
> {
>   mbtowc_p f_mbtowc = __MBTOWC;
>   if (f_mbtowc == __ascii_mbtowc)
>     {
>       f_mbtowc = __utf8_mbtowc;                                 <<<<< this 
> is ALWAYS done, no matter what charset is in use.
>     }
>   return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
> }
> Since the CP1252 is an 8-bit single-byte character set with characters >= 
> 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the 
> '0xf0' byte indicating an invalid character in the string.
> This doesn't seem to happen when e.g. '/bin/echo' is run, although I 
> haven't stepped into the code to see what's happening.
> I do not think this is a Cygwin bug, but since the User's Guide says the 
> locale and charset can be changed on the fly, I don't know what's going 
> awry.
> Any suggestions? If you need more information, I'm happy to provide it.

Try:

$ chcp.com
Active code page: 850
$ chcp.com 65001
Active code page: 65001
$ chcp.com
Active code page: 65001

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

  reply	other threads:[~2020-08-03 16:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 15:36 Michael Shay
2020-08-03 16:31 ` Brian Inglis [this message]
2020-08-03 17:10   ` Michael Shay
2020-08-03 17:42     ` Andrey Repin
2020-08-03 18:15       ` Michael Shay
2020-08-03 21:23       ` Trouble with output character sets from Win32 applications running under mintty Brian Inglis
2020-08-03 22:05         ` Michael Shay
2020-08-04 12:32           ` Trouble with output character sets from Win32 applications running under mksh Brian Inglis
2020-08-04 21:19             ` Michael Shay
2020-08-05  2:10               ` Thomas Wolff
2020-08-05  5:22               ` Brian Inglis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca \
    --to=brian.inglis@systematicsw.ab.ca \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).