From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: cygwin@cygwin.com
Subject: Re: Trouble with character sets
Date: Mon, 3 Aug 2020 10:31:15 -0600 [thread overview]
Message-ID: <ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca> (raw)
In-Reply-To: <OF3F4D2646.3A75682C-ON852585B5.0058983D-852585B9.0055B758@abinitio.com>
On 2020-08-03 09:36, Michael Shay via Cygwin wrote:
> I'm having a problem with Cygwin 3.1.4, changing the character set on the
> fly. It seems to work with Cygwin applications, but not with Win32
> applications.
> I have a Korn shell script:
> #!/bin/ksh
> OLD_LANG="$LANG"
> OLD_LC_ALL="$LC_ALL"
> echo "locale on entry"
> locale
> echo ""
> export LANG="en_US.CP1252"
> export LC_ALL=en_US.CP1252
> echo "locale changed to"
> locale
> echo ""
> # Default is to run the Win32 program. Input any argument other than
> 'WIN32'
> # to run '/bin/echo'.
> case $# in
> 0 ) echo "Running WIN32 pgm"
> ksh -c 'cygtest.exe ZÇ'
> ;;
> 1 ) echo "Running Cygwin 'echo'"
> ksh -c '/bin/echo ZÇ'
> ;;
> 2 ) echo "Running WIN32 pgm"
> ksh -c 'cygtest.exe ZÇ'
> echo ""
> echo "Running Cygwin 'echo'"
> ksh -c '/bin/echo ZÇ'
> ;;
> * ) ;;
> esac
> LC_ALL="$OLD_LC_ALL"
> LANG="$OLD_LANG"
> and a Win32 application (attached file cygtest.cpp)
> I used gdb to see what was happening in child_info_spawn::worker(), when a
> Win32 program is started using:
> rc = CreateProcessW (runpath, /* image name w/ full path */
> cmd.wcs (wcmd), /* what was passed to exec */
> sa, /* process security attrs */
> sa, /* thread security attrs */
> TRUE, /* inherit handles */
> c_flags,
> envblock, /* environment */
> NULL,
> &si,
> &pi);
> Specifically, 'cmd.wcs(wcmd)' invokes:
> wchar_t *wcs (wchar_t *wbuf, size_t n)
> {
> if (n == 1)
> wbuf[0] = L'\0';
> else
> sys_mbstowcs (wbuf, n, buf);
> return wbuf;
> }
> and sys_mbstowcs():
> size_t __reg3
> sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
> {
> mbtowc_p f_mbtowc = __MBTOWC;
> if (f_mbtowc == __ascii_mbtowc)
> {
> f_mbtowc = __utf8_mbtowc; <<<<< this
> is ALWAYS done, no matter what charset is in use.
> }
> return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
> }
> Since the CP1252 is an 8-bit single-byte character set with characters >=
> 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the
> '0xf0' byte indicating an invalid character in the string.
> This doesn't seem to happen when e.g. '/bin/echo' is run, although I
> haven't stepped into the code to see what's happening.
> I do not think this is a Cygwin bug, but since the User's Guide says the
> locale and charset can be changed on the fly, I don't know what's going
> awry.
> Any suggestions? If you need more information, I'm happy to provide it.
Try:
$ chcp.com
Active code page: 850
$ chcp.com 65001
Active code page: 65001
$ chcp.com
Active code page: 65001
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
next prev parent reply other threads:[~2020-08-03 16:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-03 15:36 Michael Shay
2020-08-03 16:31 ` Brian Inglis [this message]
2020-08-03 17:10 ` Michael Shay
2020-08-03 17:42 ` Andrey Repin
2020-08-03 18:15 ` Michael Shay
2020-08-03 21:23 ` Trouble with output character sets from Win32 applications running under mintty Brian Inglis
2020-08-03 22:05 ` Michael Shay
2020-08-04 12:32 ` Trouble with output character sets from Win32 applications running under mksh Brian Inglis
2020-08-04 21:19 ` Michael Shay
2020-08-05 2:10 ` Thomas Wolff
2020-08-05 5:22 ` Brian Inglis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca \
--to=brian.inglis@systematicsw.ab.ca \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).