From: "Michael Shay" <MShay@ABINITIO.COM>
To: cygwin@cygwin.com
Subject: Re: Trouble with output character sets from Win32 applications running under mintty
Date: Mon, 3 Aug 2020 18:05:18 -0400 [thread overview]
Message-ID: <OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com> (raw)
In-Reply-To: <d8133245-02f0-71a7-e409-bf3b82fc7756@SystematicSw.ab.ca>
Michael
From: "Brian Inglis" <Brian.Inglis@SystematicSw.ab.ca>
To: cygwin@cygwin.com
Date: 08/03/2020 05:23 PM
Subject: Re: Trouble with output character sets from Win32
applications running under mintty
Sent by: "Cygwin" <cygwin-bounces@cygwin.com>
On 2020-08-03 11:42, Andrey Repin wrote:
>> Doesn't help. I tried 65001 (UTF-8):
>
> Because you're confusing things.
> chcp has nothing to do with LANG or LC_*.
> Et vice versa.
>
> chcp sets console code page for native console applications. Only for
those
> supporting it. Many do not.
> LANG sets output parameters for Cygwin applications (and other programs
that
> look for it, but these are few).
You cut the significant statement at the top of the OP:
>> I'm having a problem with Cygwin 3.1.4, changing the character set on
the
>> fly. It seems to work with Cygwin applications, but not with Win32
>> applications.
He has problems with invalid characters only running win32 console
applications:
I changed the subject to hopefully better reflect the issue.
I am unsure where Cygwin 3.1.4 comes into Win32 applications - you have to
use
the Windows codepage conversion routines.
You can only change input character sets on the fly; output character sets
will
depend on mintty support of xterm-compatible character set support and
switching
escape sequences; if you set up UCS16LE console output, Windows and mintty
should handle it.
Perhaps a better description of your environment, build tools, what you
are
trying to do, what you expect as output, and what you are getting as
output,
could help us better understand and help with the issue you see.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
The script I sent changes the locale information i.e. LANG and LC_ALL are
set to en_US.CP1252. i.e.
export LANG="en_US.CP1252"
export LC_ALL=en_US.CP1252
Then, it runs a simple Win32 program that takes a single input argument,
ZÇ, the second character being C-cedilla, an 8-bit character, hex value
0xc7. The Win32 program transcodes the input Unicode argument using the
Cygwin character set to determine the codepage, 1252. It then prints the
transcoded characters to stdout, and the result should be ZÇ, identical to
the input argument. This works fine using Cygwin 1.7.28. Cygwin 3.1.4 is
launching the Win32 application, and is responsible for transcoding the
arguments passed to it by mksh, in this case CP1252 characters ZÇ, into
Unicode. That means Cygwin has to use the mb-to-uc function for
transcoding codepage 1252 to Unicode. It does not. It uses the UTF-8 to
Unicode function (I've seen this using gdb). That function flags the Ç as
an invalid UTF-8 sequence, not surprisingly since it's not a UTF-8
character. No matter what character set I use in 'export LANG...' and
'export LC_ALL...', Cygwin 3.1.4 always uses the uft8-to-wc transcoding
function in sys1.7.28 Uses the correct function. I'm not using mintty, I'm
using mksh, a requirement since our software uses lots of shell scripts,
and for legacy support, that means using a Korn shell. I could understand
it if 1.7.28 didn't do the proper transcoding, but it does.
I used:
gdb mksh
to load mksh into the debugger, then started it with
start -c 'cygtest.exe ZÇ'
That allowed me to step into child_info_spawn::worker() and stop at the
call to CreateProcess(), where the command line (cygtest.exe) and argument
(ZÇ) are translated into Unicode.
This is the code to which I'm referring, in strfuncs.cc, which is supposed
to translate the command line and arguments from CP 1252 into Unicode.
size_t __reg3
sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
{
mbtowc_p f_mbtowc = __MBTOWC;
if (f_mbtowc == __ascii_mbtowc)
{
f_mbtowc = __utf8_mbtowc; <<<< THE CODE CHANGES THE
'__ascii_mbtowc' TO '__utf8_mbtowc' EVERY TIME, REGARDLESS OF THE
CODEPAGE.
}
return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
}
So 'f_mbtowc' is set to _ascii_mbtowc, the default.You said:
You can only change input character sets on the fly;
The input character set to Cygwin should have been changed to CP 1252, as
it was in 1.7.28. At least, that's what I would expect to happen. If it
does not, or if miintty is required, then that's a regression from 1.7.28.
Mike Shay
NOTICE from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.
next prev parent reply other threads:[~2020-08-03 22:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-03 15:36 Trouble with character sets Michael Shay
2020-08-03 16:31 ` Brian Inglis
2020-08-03 17:10 ` Michael Shay
2020-08-03 17:42 ` Andrey Repin
2020-08-03 18:15 ` Michael Shay
2020-08-03 21:23 ` Trouble with output character sets from Win32 applications running under mintty Brian Inglis
2020-08-03 22:05 ` Michael Shay [this message]
2020-08-04 12:32 ` Trouble with output character sets from Win32 applications running under mksh Brian Inglis
2020-08-04 21:19 ` Michael Shay
2020-08-05 2:10 ` Thomas Wolff
2020-08-05 5:22 ` Brian Inglis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com \
--to=mshay@abinitio.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).