From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from Longs.ABINITIO.com (fw-lex.abinitio.com [65.170.40.234]) by sourceware.org (Postfix) with ESMTP id A24DD3850413 for ; Mon, 3 Aug 2020 15:36:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A24DD3850413 To: cygwin@cygwin.com Subject: Trouble with character sets Message-ID: From: "Michael Shay" Date: Mon, 3 Aug 2020 11:36:14 -0400 Content-Type: multipart/mixed; boundary="=_mixed 0055B756852585B9_=" MIME-Version: 1.0 X-KeepSent: 3F4D2646:3A75682C-852585B5:0058983D; name=$KeepSent; type=4 X-Disclaimed: 5667 X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, HTML_MESSAGE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Aug 2020 15:36:17 -0000 --=_mixed 0055B756852585B9_= Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable I'm having a problem with Cygwin 3.1.4, changing the character set on the=20 fly. It seems to work with Cygwin applications, but not with Win32=20 applications. I have a Korn shell script: #!/bin/ksh OLD=5FLANG=3D"$LANG" OLD=5FLC=5FALL=3D"$LC=5FALL" echo "locale on entry" locale echo "" export LANG=3D"en=5FUS.CP1252" export LC=5FALL=3Den=5FUS.CP1252 echo "locale changed to" locale echo "" # Default is to run the Win32 program. Input any argument other than=20 'WIN32' # to run '/bin/echo'. case $# in 0 ) echo "Running WIN32 pgm" ksh -c 'cygtest.exe Z=C7' ;; 1 ) echo "Running Cygwin 'echo'" ksh -c '/bin/echo Z=C7' ;; 2 ) echo "Running WIN32 pgm" ksh -c 'cygtest.exe Z=C7' echo "" echo "Running Cygwin 'echo'" ksh -c '/bin/echo Z=C7' ;; * ) ;; esac LC=5FALL=3D"$OLD=5FLC=5FALL" LANG=3D"$OLD=5FLANG" and a Win32 application (attached file cygtest.cpp) I used gdb to see what was happening in child=5Finfo=5Fspawn::worker(), whe= n a=20 Win32 program is started using: rc =3D CreateProcessW (runpath, /* image name w/ full path */ cmd.wcs (wcmd), /* what was passed to exec */ sa, /* process security attrs */ sa, /* thread security attrs */ TRUE, /* inherit handles */ c=5Fflags, envblock, /* environment */ NULL, &si, &pi); Specifically, 'cmd.wcs(wcmd)' invokes: wchar=5Ft *wcs (wchar=5Ft *wbuf, size=5Ft n) { if (n =3D=3D 1) wbuf[0] =3D L'\0'; else sys=5Fmbstowcs (wbuf, n, buf); return wbuf; } and sys=5Fmbstowcs(): size=5Ft =5F=5Freg3 sys=5Fmbstowcs (wchar=5Ft * dst, size=5Ft dlen, const char *src, size=5Ft n= ms) { mbtowc=5Fp f=5Fmbtowc =3D =5F=5FMBTOWC; if (f=5Fmbtowc =3D=3D =5F=5Fascii=5Fmbtowc) { f=5Fmbtowc =3D =5F=5Futf8=5Fmbtowc; <= <<<< this=20 is ALWAYS done, no matter what charset is in use. } return sys=5Fcp=5Fmbstowcs (f=5Fmbtowc, dst, dlen, src, nms); } Since the CP1252 is an 8-bit single-byte character set with characters >=3D= =20 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the=20 '0xf0' byte indicating an invalid character in the string. This doesn't seem to happen when e.g. '/bin/echo' is run, although I=20 haven't stepped into the code to see what's happening. I do not think this is a Cygwin bug, but since the User's Guide says the=20 locale and charset can be changed on the fly, I don't know what's going=20 awry. Any suggestions? If you need more information, I'm happy to provide it. Mike Shay Here's the source for the Win32 program. I built it with Visual Studio=20 2015, to get something running quickly. =20 NOTICE from Ab Initio: This email (including any attachments) may contain = information that is subject to confidentiality obligations or is legally pr= ivileged, and sender does not waive confidentiality or privilege. If receiv= ed in error, please notify the sender, delete this email, and make no furth= er use, disclosure, or distribution. --=_mixed 0055B756852585B9_= Content-Type: application/octet-stream; name="cygtest.cpp" Content-Disposition: attachment; filename="cygtest.cpp" Content-Transfer-Encoding: base64 Ly8gY3lndGVzdC5jcHAgOiBEZWZpbmVzIHRoZSBlbnRyeSBwb2ludCBmb3IgdGhlIGNvbnNvbGUg YXBwbGljYXRpb24uDQovLw0KDQoNCiNpbmNsdWRlIDxTREtEREtWZXIuaD4NCiNpbmNsdWRlIDxz dGRpby5oPg0KI2luY2x1ZGUgPHdpbmRvd3MuaD4NCiNpbmNsdWRlIDxzdHJpbmc+DQp1c2luZyBu YW1lc3BhY2Ugc3RkOw0KDQpMUFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBT VFIgbHBhLCBpbnQgbkJ5dGVzLCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdl KTsNCg0Kc3RhdGljIFVJTlQgY3lnX2NvZGVwYWdlX3N0cmluZ190b19DUChjb25zdCBzdHJpbmcg JmNwKQ0Kew0KICBjb25zdCBzdHJpbmcgVVRGOCA9ICJVVEYtOCI7DQogIGNvbnN0IHN0cmluZyB1 dGY4ID0gInV0Zi04IjsNCiAgY29uc3Qgc3RyaW5nIEFOU0kgPSAiQU5TSSI7DQogIGNvbnN0IHN0 cmluZyBhbnNpID0gImFuc2kiOw0KICBjb25zdCBzdHJpbmcgSVNPODg1OTEgPSAiSVNPLTg4NTkt MSI7DQogIGNvbnN0IHN0cmluZyBpc284ODU5MSA9ICJpc28tODg1OS0xIjsNCiAgY29uc3Qgc3Ry aW5nIE9FTSA9ICJPRU0iOw0KICBjb25zdCBzdHJpbmcgb2VtID0gIm9lbSI7DQogIGNvbnN0IHN0 cmluZyBXSU5ET1dTID0gIldJTkRPV1MiOw0KICBjb25zdCBzdHJpbmcgd2luZG93cyA9ICJ3aW5k b3dzIjsNCiAgY29uc3Qgc3RyaW5nIENPREVQQUdFID0gIkNQIjsNCiAgY29uc3Qgc3RyaW5nIGNv ZGVwYWdlID0gImNwIjsNCiAgVUlOVCBzaGVsbF9jcHsgMCB9Ow0KDQogIGlmIChOVUxMID09IGNw LmNfc3RyKCkgfHwgY3AubGVuZ3RoKCkgPT0gMCkNCiAgICByZXR1cm4gMDsNCg0KICBpZiAoKGNw LmNvbXBhcmUodXRmOCkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoVVRGOCkgPT0gMCkpDQogICAgc2hl bGxfY3AgPSA2NTAwMTsNCiAgZWxzZSBpZiAoKGNwLmNvbXBhcmUoYW5zaSkgPT0gMCkgfHwgKGNw LmNvbXBhcmUoQU5TSSkgPT0gMCkNCiAgICB8fCAoY3AuY29tcGFyZShJU084ODU5MSkgPT0gMCkg fHwgKGNwLmNvbXBhcmUoaXNvODg1OTEpID09IDApKQ0KICAgIHNoZWxsX2NwID0gMTI1MjsNCiAg Ly8gb2VtIGlzIGFsc28gc3RhbmRhcmQgY3lnd2luIG5vbWVuY2xhdHVyZQ0KICBlbHNlIGlmICgo Y3AuY29tcGFyZShvZW0pID09IDApIHx8IChjcC5jb21wYXJlKE9FTSkgPT0gMCkpDQogICAgc2hl bGxfY3AgPSA0Mzc7DQogIC8vIGNwWFhYLCB3aW5kb3dzLVhYWCBhbmQgd2luZG93c19YWFggYXJl IGFsbCByZWNvZ25pemVkIGJ5DQogIC8vIHRoZSBBYiBJbml0aW8gZXh0ZW5zaW9ucyB0byBjeWd3 aW4uICBOb3Qgc3VyZSBpZiB0aGV5IGFyZQ0KICAvLyBrbm93biB0byBzdGFuZGFyZCBjeWd3aW4s IGJ1dCBJIGRvbid0IHRoaW5rIHRoZXkgYXJlLg0KICBlbHNlIGlmICgoY3AuY29tcGFyZSgwLCAy LCBjb2RlcGFnZSkgPT0gMCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCAyLCBDT0RFUEFHRSkgPT0g MCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCA3LCB3aW5kb3dzKSA9PSAwKSB8fA0KICAgIChjcC5j b21wYXJlKDAsIDcsIFdJTkRPV1MpID09IDApKSB7DQogICAgLy8gSWYgdGhlIHByZWZpeCBpcyAi Q1AiIG9yICJjcCIgdGhlbiBnZXQgdGhlIG51bWJlciBhZnRlciB0aGF0DQogICAgLy8gZWxzZSBp dCdzICJXSU5ET1dTey0sX30iIG9yICJXSU5ET1dTey0sX30iDQogICAgaW50IG9mZnNldCA9ICgo Y3AuY29tcGFyZSgwLCAyLCBjb2RlcGFnZSkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoMCwgMiwgQ09E RVBBR0UpID09IDApKSA/IDIgOiA4Ow0KICAgIHNoZWxsX2NwID0gYXRvaShjcC5zdWJzdHIob2Zm c2V0KS5jX3N0cigpKTsNCiAgfQ0KICByZXR1cm4gc2hlbGxfY3A7DQp9DQoNCnN0YXRpYyBVSU5U IGdldF9jeWd3aW5fY29kZXBhZ2UoKQ0Kew0KICBzdHJpbmcgZGVmYXVsdF9jeWdfY2hhcnNldCA9 ICJDLlVURi04IjsgICAgICAgIC8vIEN5Z3dpbiBkZWZhdWx0IGNoYXJhY3RlciBzZXQNCiAgc3Ry aW5nIGN5Z19sb2NhbGU7DQogIFVJTlQgc2hlbGxfY3B7IDAgfTsNCiAgVUlOVCBkZWZhdWx0X2Nw eyA2NTAwMSB9Ow0KICBjaGFyICplbnZwdHIgPSA6OmdldGVudigiTEFORyIpOw0KDQogIGlmIChO VUxMID09IGVudnB0cikNCiAgICBlbnZwdHIgPSA6OmdldGVudigiTENfQUxMIik7DQoNCiAgY3ln X2xvY2FsZSA9IChOVUxMID09IGVudnB0ciA/IGRlZmF1bHRfY3lnX2NoYXJzZXQgOiBlbnZwdHIp Ow0KICAvLyBUaGUgJ3ZhbHVlJyBmaWVsZCBvZiB0aGUgZW52aXJvbm1lbnQgc3RyaW5nICJ2YXJf bmFtZT12YWx1ZSINCiAgLy8gd2lsbCBiZSBvZiB0aGUgZm9ybTogPGxhbmd1YWdlIElEPi48Y29k ZXBhZ2UgSUQ+DQogIC8vIFdlIHdhbnQgdGhlIHN1YnN0cmluZyBhZnRlciB0aGUgJy4nICANCiAg aW50IGRvdFBvcyA9IGN5Z19sb2NhbGUuZmluZF9maXJzdF9vZignLicpOw0KICBpZiAoZG90UG9z ID49IDApIHsNCiAgICAvLyBUaGUgY2hhcmFjdGVyIHNldCBzdHJpbmcsIGlmIHNwZWNpZmllZCwg c3RhcnRzIEFGVEVSICB0aGUgJy4nLg0KICAgIC8vIElmIE5PVCBzcGVjaWZpZWQsIHJldHVybiB0 aGUgaW5wdXQgZGVmYXVsdC4NCiAgICBzdHJpbmcgcGFnZSA9IGN5Z19sb2NhbGUuc3Vic3RyKCsr ZG90UG9zKTsNCiAgICBpZiAoMCA8PSAoc2hlbGxfY3AgPSBjeWdfY29kZXBhZ2Vfc3RyaW5nX3Rv X0NQKHBhZ2UpKSkgew0KICAgICAgcmV0dXJuIHNoZWxsX2NwOw0KICAgIH0gIC8vIGVuZCBTSEVM TF9DUA0KICB9ICAgIC8vIGVuZCBFUVBPUw0KICByZXR1cm4gZGVmYXVsdF9jcDsNCn0NCg0KDQpM UFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBTVFIgbHBhLCBpbnQgbkJ5dGVz LCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdlKQ0Kew0KICBzdGF0aWMgaW50 IHByaW50SW5mbyA9IDA7DQogIGludCBuT3V0ID0gMDsNCg0KICBpZiAoTlVMTCA9PSBscGEpIHsN CiAgICBwcmludGYoIk5VTEwgaW5wdXQgc3RyaW5nXG4iKTsNCiAgICByZXR1cm4gTlVMTDsNCiAg fQ0KDQogIGlmIChwcmludEluZm8pIHsNCiAgICBwcmludGYoIlRyYW5zY29kaW5nIHVzaW5nIEN5 Z3dpbiBjb2RlcGFnZTogJWRcbklucHV0IHdpZGVjaGFyIHN0cmluZzpcbiIsIGNvZGVwYWdlKTsN CiAgICBmb3IgKGludCBpID0gMDsgaSA8IG5DaGFyczsgaSsrKQ0KICAgICAgcHJpbnRmKCJcdGxw d1slZF0gPSAlQyAtICUwMlhcbiIsIGksIGxwd1tpXSwgbHB3W2ldKTsNCiAgfQ0KICArK3ByaW50 SW5mbzsNCg0KICBpZiAobkNoYXJzID4gMCkgew0KICAgIGlmICgwID09IChuT3V0ID0gV2lkZUNo YXJUb011bHRpQnl0ZShjb2RlcGFnZSwgMCwgbHB3LCBuQ2hhcnMsIGxwYSwgbkJ5dGVzLCBOVUxM LCBOVUxMKSkpIHsNCiAgICAgIERXT1JEIGR3RXJyID0gR2V0TGFzdEVycm9yKCk7DQogICAgICBw cmludGYoIldpZGVDaGFyVG9NdWx0aUJ5dGUoJWQsICVTKSBmYWlsZWQsIGVycm9yICVkXG4iLCBj b2RlcGFnZSwgbHB3LCBkd0Vycik7DQogICAgICByZXR1cm4gTlVMTDsNCiAgICB9DQogIH0NCiAg bHBhW25PdXRdID0gJ1wwJzsNCiAgcmV0dXJuIGxwYTsNCn0NCg0KaW50IHdtYWluKGludCBhcmdj LCB3Y2hhcl90Kiogd2FyZ3YpDQp7DQogIHRyeSB7DQogICAgY2hhciAqcE51bGwgPSAiTlVMTCI7 DQogICAgY2hhcioqIGFyZ3YgPSBuZXcgY2hhcipbKGFyZ2MpKzFdOw0KICAgIGludCBfYXJnaTsN CiAgICBpbnQgY29kZXBhZ2UgPSBnZXRfY3lnd2luX2NvZGVwYWdlKCk7DQogICAgZm9yIChfYXJn aSA9IDA7IF9hcmdpIDwgKGFyZ2MpOyBfYXJnaSsrKSB7DQogICAgICBpZiAod2FyZ3ZbX2FyZ2ld KSB7DQogICAgICAgIExQV1NUUiB1dGZfbHB3ICA9IHdhcmd2W19hcmdpXTsNCiAgICAgICAgaW50 IHV0Zl9sZW4gICAgID0gbHN0cmxlblcodXRmX2xwdyk7DQogICAgICAgIGludCB1dGZfY29udmVy dCA9IHV0Zl9sZW4gKiAzICsgMTsNCiAgICAgICAgTFBTVFIgdXRmX2xwYSAgID0gKExQU1RSKV9h bGxvY2EodXRmX2NvbnZlcnQpOw0KICAgICAgICBhcmd2W19hcmdpXSAgICAgPSBVbmljb2RlVG9N Qnl0ZUhlbHBlcih1dGZfbHBhLCB1dGZfY29udmVydCwgdXRmX2xwdywgdXRmX2xlbiwgY29kZXBh Z2UpOw0KICAgICAgfQ0KICAgICAgZWxzZSB7DQogICAgICAgIGFyZ3ZbX2FyZ2ldID0gcE51bGw7 DQogICAgICB9DQogICAgfQ0KICAgIGFyZ3ZbKGFyZ2MpXSA9IE5VTEw7DQoNCiAgICAvLyBOb3cg cHJpbnQgdGhlIHRyYW5zY29kZWQgc3RyaW5nLg0KDQogICAgZm9yIChpbnQgaSA9IDE7IGkgPCBh cmdjOyBpKyspDQogICAgICBwcmludGYoIiVzOiAlc1xuIiwgX19GVU5DVElPTl9fLCBhcmd2W2ld KTsNCg0KICAgIHJldHVybiAwOw0KICB9DQogIGNhdGNoICguLi4pIHsNCiAgICBwcmludGYoIkNh dWdodCB1bmhhbmRsZWQgZXhjZXB0aW9uXG4iKTsNCiAgfQ0KfQ0K --=_mixed 0055B756852585B9_=--