public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: "Michael Shay" <MShay@ABINITIO.COM>
To: cygwin@cygwin.com
Subject: Re: Trouble with output character sets from Win32 applications running under mintty
Date: Mon, 3 Aug 2020 18:05:18 -0400	[thread overview]
Message-ID: <OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com> (raw)
In-Reply-To: <d8133245-02f0-71a7-e409-bf3b82fc7756@SystematicSw.ab.ca>

Michael



From:   "Brian Inglis" <Brian.Inglis@SystematicSw.ab.ca>
To:     cygwin@cygwin.com
Date:   08/03/2020 05:23 PM
Subject:        Re: Trouble with output character sets from Win32 
applications running under mintty
Sent by:        "Cygwin" <cygwin-bounces@cygwin.com>



On 2020-08-03 11:42, Andrey Repin wrote:
>> Doesn't help. I tried 65001 (UTF-8):
> 
> Because you're confusing things.
> chcp has nothing to do with LANG or LC_*.
> Et vice versa.
> 
> chcp sets console code page for native console applications. Only for 
those
> supporting it. Many do not.
> LANG sets output parameters for Cygwin applications (and other programs 
that
> look for it, but these are few).

You cut the significant statement at the top of the OP:

>> I'm having a problem with Cygwin 3.1.4, changing the character set on 
the 
>> fly. It seems to work with Cygwin applications, but not with Win32 
>> applications.

He has problems with invalid characters only running win32 console 
applications:
I changed the subject to hopefully better reflect the issue.

I am unsure where Cygwin 3.1.4 comes into Win32 applications - you have to 
use
the Windows codepage conversion routines.

You can only change input character sets on the fly; output character sets 
will
depend on mintty support of xterm-compatible character set support and 
switching
escape sequences; if you set up UCS16LE console output, Windows and mintty
should handle it.

Perhaps a better description of your environment, build tools, what you 
are
trying to do, what you expect as output, and what you are getting as 
output,
could help us better understand and help with the issue you see.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

The script I sent changes the locale information i.e. LANG and LC_ALL are 
set to en_US.CP1252. i.e.

export LANG="en_US.CP1252"
export LC_ALL=en_US.CP1252

Then, it runs a simple Win32 program that takes a single input argument, 
ZÇ, the second character being C-cedilla, an 8-bit character, hex value 
0xc7. The Win32 program transcodes the input Unicode argument using the 
Cygwin character set to determine the codepage, 1252. It then prints the 
transcoded characters to stdout, and the result should be ZÇ, identical to 
the input argument. This works fine using Cygwin 1.7.28. Cygwin 3.1.4 is 
launching the Win32 application, and is responsible for transcoding the 
arguments passed to it by mksh, in this case CP1252 characters ZÇ, into 
Unicode. That means Cygwin has to use the mb-to-uc function for 
transcoding codepage 1252 to Unicode. It does not. It uses the UTF-8 to 
Unicode function (I've seen this using gdb). That function flags the Ç as 
an invalid UTF-8 sequence, not surprisingly since it's not a UTF-8 
character. No matter what character set I use in 'export LANG...' and 
'export LC_ALL...', Cygwin 3.1.4 always uses the uft8-to-wc transcoding 
function in sys1.7.28 Uses the correct function. I'm not using mintty, I'm 
using mksh, a requirement since our software uses lots of shell scripts, 
and for legacy support, that means using a Korn shell. I could understand 
it if 1.7.28 didn't do the proper transcoding, but it does. 

I used:

        gdb mksh

to load mksh into the debugger, then started it with

        start -c 'cygtest.exe ZÇ'

That allowed me to step into child_info_spawn::worker() and stop at the 
call to CreateProcess(), where the command line (cygtest.exe) and argument 
(ZÇ) are translated into Unicode.

This is the code to which I'm referring, in strfuncs.cc, which is supposed 
to translate the command line and arguments from CP 1252 into Unicode.

  size_t __reg3
  sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
  {
    mbtowc_p f_mbtowc = __MBTOWC;
    if (f_mbtowc == __ascii_mbtowc)
      {
        f_mbtowc = __utf8_mbtowc;       <<<< THE CODE CHANGES THE 
'__ascii_mbtowc' TO '__utf8_mbtowc' EVERY TIME, REGARDLESS OF THE 
CODEPAGE.
      }
    return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
  }

So 'f_mbtowc' is set to _ascii_mbtowc, the default.You said:

You can only change input character sets on the fly;

The input character set to Cygwin should have been changed to CP 1252, as 
it was in 1.7.28. At least, that's what I would expect to happen. If it 
does not, or if miintty is required, then that's a regression from 1.7.28.

Mike Shay







  
NOTICE  from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.

  reply	other threads:[~2020-08-03 22:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 15:36 Trouble with character sets Michael Shay
2020-08-03 16:31 ` Brian Inglis
2020-08-03 17:10   ` Michael Shay
2020-08-03 17:42     ` Andrey Repin
2020-08-03 18:15       ` Michael Shay
2020-08-03 21:23       ` Trouble with output character sets from Win32 applications running under mintty Brian Inglis
2020-08-03 22:05         ` Michael Shay [this message]
2020-08-04 12:32           ` Trouble with output character sets from Win32 applications running under mksh Brian Inglis
2020-08-04 21:19             ` Michael Shay
2020-08-05  2:10               ` Thomas Wolff
2020-08-05  5:22               ` Brian Inglis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com \
    --to=mshay@abinitio.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).