From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from Longs.ABINITIO.com (fw-lex.abinitio.com [65.170.40.234]) by sourceware.org (Postfix) with ESMTP id DA9753858D35 for ; Mon, 3 Aug 2020 17:10:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DA9753858D35 In-Reply-To: To: cygwin@cygwin.com Subject: Re: Trouble with character sets Message-ID: From: "Michael Shay" Date: Mon, 3 Aug 2020 13:10:49 -0400 References: MIME-Version: 1.0 X-KeepSent: 28060D19:DB6E392B-852585B9:005D898D; name=$KeepSent; type=4 X-Disclaimed: 46895 X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00, HTML_MESSAGE, KAM_DMARC_STATUS, LOTS_OF_MONEY, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Aug 2020 17:10:53 -0000 Doesn't help. I tried 65001 (UTF-8): ### SET CP TO UTF-8, 65001 $cygwin=5Fcharset=5Ftest.ksh Old CP 65001 locale on entry LANG=3D LC=5FCTYPE=3D"C.UTF-8" LC=5FNUMERIC=3D"C.UTF-8" LC=5FTIME=3D"C.UTF-8" LC=5FCOLLATE=3D"C.UTF-8" LC=5FMONETARY=3D"C.UTF-8" LC=5FMESSAGES=3D"C.UTF-8" LC=5FALL=3D ### CP SET TO 65001 Active code page: 65001 locale changed to LANG=3Den=5FUS.CP1252 LC=5FCTYPE=3D"en=5FUS.CP1252" LC=5FNUMERIC=3D"en=5FUS.CP1252" LC=5FTIME=3D"en=5FUS.CP1252" LC=5FCOLLATE=3D"en=5FUS.CP1252" LC=5FMONETARY=3D"en=5FUS.CP1252" LC=5FMESSAGES=3D"en=5FUS.CP1252" LC=5FALL=3Den=5FUS.CP1252 Running WIN32 pgm Transcoding using Cygwin codepage: 1252 Input widechar string: lpw[0] =3D Z - 5A lpw[1] =3D - F0C7 wmain: Z? Active code page: 65001 and 1252 ### SET CP TO 1252 $cygwin=5Fcharset=5Ftest.ksh Old CP 65001 locale on entry LANG=3D LC=5FCTYPE=3D"C.UTF-8" LC=5FNUMERIC=3D"C.UTF-8" LC=5FTIME=3D"C.UTF-8" LC=5FCOLLATE=3D"C.UTF-8" LC=5FMONETARY=3D"C.UTF-8" LC=5FMESSAGES=3D"C.UTF-8" LC=5FALL=3D ### CP SET TO 1252 Active code page: 1252 locale changed to LANG=3Den=5FUS.CP1252 LC=5FCTYPE=3D"en=5FUS.CP1252" LC=5FNUMERIC=3D"en=5FUS.CP1252" LC=5FTIME=3D"en=5FUS.CP1252" LC=5FCOLLATE=3D"en=5FUS.CP1252" LC=5FMONETARY=3D"en=5FUS.CP1252" LC=5FMESSAGES=3D"en=5FUS.CP1252" LC=5FALL=3Den=5FUS.CP1252 Running WIN32 pgm Transcoding using Cygwin codepage: 1252 Input widechar string: lpw[0] =3D Z - 5A lpw[1] =3D - F0C7 wmain: Z? Active code page: 65001 Michael From: "Brian Inglis" To: cygwin@cygwin.com Date: 08/03/2020 12:31 PM Subject: Re: Trouble with character sets Sent by: "Cygwin" On 2020-08-03 09:36, Michael Shay via Cygwin wrote: > I'm having a problem with Cygwin 3.1.4, changing the character set on=20 the=20 > fly. It seems to work with Cygwin applications, but not with Win32=20 > applications. > I have a Korn shell script: > #!/bin/ksh > OLD=5FLANG=3D"$LANG" > OLD=5FLC=5FALL=3D"$LC=5FALL" > echo "locale on entry" > locale > echo "" > export LANG=3D"en=5FUS.CP1252" > export LC=5FALL=3Den=5FUS.CP1252 > echo "locale changed to" > locale > echo "" > # Default is to run the Win32 program. Input any argument other than=20 > 'WIN32' > # to run '/bin/echo'. > case $# in > 0 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe Z=C7' > ;; > 1 ) echo "Running Cygwin 'echo'" > ksh -c '/bin/echo Z=C7' > ;; > 2 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe Z=C7' > echo "" > echo "Running Cygwin 'echo'" > ksh -c '/bin/echo Z=C7' > ;; > * ) ;; > esac > LC=5FALL=3D"$OLD=5FLC=5FALL" > LANG=3D"$OLD=5FLANG" > and a Win32 application (attached file cygtest.cpp) > I used gdb to see what was happening in child=5Finfo=5Fspawn::worker(), w= hen=20 a=20 > Win32 program is started using: > rc =3D CreateProcessW (runpath, /* image name w/ full path */ > cmd.wcs (wcmd), /* what was passed to exec */ > sa, /* process security attrs */ > sa, /* thread security attrs */ > TRUE, /* inherit handles */ > c=5Fflags, > envblock, /* environment */ > NULL, > &si, > &pi); > Specifically, 'cmd.wcs(wcmd)' invokes: > wchar=5Ft *wcs (wchar=5Ft *wbuf, size=5Ft n) > { > if (n =3D=3D 1) > wbuf[0] =3D L'\0'; > else > sys=5Fmbstowcs (wbuf, n, buf); > return wbuf; > } > and sys=5Fmbstowcs(): > size=5Ft =5F=5Freg3 > sys=5Fmbstowcs (wchar=5Ft * dst, size=5Ft dlen, const char *src, size=5Ft= nms) > { > mbtowc=5Fp f=5Fmbtowc =3D =5F=5FMBTOWC; > if (f=5Fmbtowc =3D=3D =5F=5Fascii=5Fmbtowc) > { > f=5Fmbtowc =3D =5F=5Futf8=5Fmbtowc; = <<<<<=20 this=20 > is ALWAYS done, no matter what charset is in use. > } > return sys=5Fcp=5Fmbstowcs (f=5Fmbtowc, dst, dlen, src, nms); > } > Since the CP1252 is an 8-bit single-byte character set with characters=20 >=3D=20 > 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the = > '0xf0' byte indicating an invalid character in the string. > This doesn't seem to happen when e.g. '/bin/echo' is run, although I=20 > haven't stepped into the code to see what's happening. > I do not think this is a Cygwin bug, but since the User's Guide says the = > locale and charset can be changed on the fly, I don't know what's going=20 > awry. > Any suggestions? If you need more information, I'm happy to provide it. Try: $ chcp.com Active code page: 850 $ chcp.com 65001 Active code page: 65001 $ chcp.com Active code page: 65001 --=20 Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple =20 NOTICE from Ab Initio: This email (including any attachments) may contain = information that is subject to confidentiality obligations or is legally pr= ivileged, and sender does not waive confidentiality or privilege. If receiv= ed in error, please notify the sender, delete this email, and make no furth= er use, disclosure, or distribution.