From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m15112.mail.126.com (m15112.mail.126.com [220.181.15.112]) by sourceware.org (Postfix) with ESMTPS id 1C19C3851C39 for ; Thu, 14 Jan 2021 03:27:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1C19C3851C39 Received: from [192.168.50.220] (unknown [116.236.172.42]) by smtp2 (Coremail) with SMTP id DMmowAA3bQEjuv9fLyPSLA--.20591S2; Thu, 14 Jan 2021 11:27:32 +0800 (CST) Subject: Re: gcc can't process some utf-8 characters To: Roy Qu , gcc@gcc.gnu.org References: From: Liu Hao Autocrypt: addr=lh_mouse@126.com; prefer-encrypt=mutual; keydata= mQINBFvNzjsBEADgRSHQwFcRdrKpmUxYOyjJKduTZgGP90O0ZrSUzqjuM5x/0NpjgV3PRk7S OWMJAQ3u66jyG/iZnMzpIca+gdObCtaqHPG5NyOwlUjlQcRI7tTaJWGjwVTco2np6z1msAkE L4dRCwVaud5U8LoukcQcuBiCrsdx2Sp9QUR33lUEfQajks0HKFvHHqdooHiflEY89lLpcM18 r+VMXviPrBPBoYesvYWSWLEDKnAkxl+y2KjPFnCUYFh4eHlh2GndUGPZMCYqu8t8EJcfl/Zp nRkHjRDhqwNHHj2JCTO2U12H25G0C2pvlbeZNTDnTp7m0YGsnp6RO4sFYxQE0f4rxqA1K7BP gBdlqdOJO/CasCMNeGqjP1lcSIJ38/EBLR/GyM7yNwT9P1oowy3KE+w0NlSrWxLU4zs873Ev SKx33xcmyVkZyZjMWA+OFt4LM57fK/CwJc3C0yFYJsutpP0espCr2IJLGEcP/srL71SRFZ9I nZBnV3uyWPxWDvjB0tDLmXCi1I7Uprz607yCjxEiHlclk1KSrRizRL/XJNbz3TxqbiwRjp6+ u2OJZv1j8XGb6XCWdEEyQYS47pFLHthocdlTutrHjFFMYIh8jvcX2ulX3I2aWuiqyGep32Zd gLxWCk4VdKnmLVMG53ERTXKR/GfH31uwWRpD212Zq1VU6weeVwARAQABtBpMaXUgSGFvIDxs aF9tb3VzZUAxMjYuY29tPokCVAQTAQoAPhYhBJ/XgxqbURL361kXj0GzMngiMVvEBQJbzc47 AhsjBQkJZgGABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEEGzMngiMVvEOBAP/jnQVwl7 IQKTr1xd2WRkQ+yefR+QFP1a0ujPPSKxne0e+6K97qt51E8JELGCdIoVXlzsvaFXSIqVgncH 0VnV9ciDV7jlBI/G66btRZDfGwYBSSOmgz/rTkBrpmC8+lPbqUDWdLx5J6bjUroS8XWlN3oS RZAPI+RzWjT89S+q19HGESuV+/h9OTxO51RIeA4/XVueWXFWaZ2ZQrsiBE2HZg1rZ2654i4b OnKOgFyj4FibZXmcSgDr6O6LjaXVCm2dfZZ2jkjEMzaI/FRE/yj2EKciKk5Fjy6lpWG/xyCn YhGgI6THu1Ynbz2RMm1cSISXJkC73xfkAjpRsHM0yAFU3EAlGGMo8eDV3VhFDGftPZFqd4uN YC7ku9PSxi1UF+T//QXqerSIofLegjG9Lu3brILV6f5BqhUod6++EqenydndpSvvxHjrSFz0 qzQEDpcc/sPDQbtYayAKHYzGV6SPITiAnItoB7M43WNW56uJtULuuZhRz9r6umJa1Y+gzCkt kDW1yAcWQ5VEAcI4mUBHvbmS9KKnTbw6SRQvHSXMELOVVPYjZceiA9NacESuVuwfVLm9O7vR 4ADFm4cColAsdm62vcbLLohws1MMMAw9SvL1X0v2VxCAYf40pUPtg3DHJWwdi7By4aSc4pqn 6DtDMXrMZ3oqQKKgMM/d+ohw+KGJuQINBFvNzjsBEADGwb/XwE2xzrUnmMjsIpobIOG6sjS2 WE0n/ocTe2cWqSLxwSAwNQ9LXJSgNb9qTTSxLKN/exHFMIalpypN8gpckQ13mGziayYb2LzS 2E9Rlocp91Fv3grR5lrMkCAMtKpcSj1d4hOdMcLOTiJTmDNz06FSmSQBwO3RzrwSCFVibOWn 444cdsc2nd6R96pcff2EnT8h0wrZnwqCUmqTFxVwYcZvTCM/qh8kbUB6cFfk9qOf/az62Yxo jatBBle63xv4SyhRwIokeLzQpU3MeNTLFYEBUkD91min8Du/NfhYS3ZwU6J3vMTm+53kru2N hOMqlgBAs24aJsJYeM2YIPbLVw5ZdG9AvrxTRxVu78DTlgypYMbkM7OFGqWBNamFmhFHZkSi PVHJm225B0tByBpdpxCZtWo2K4ygjE/tqBiN26U1WBjiZb2YO6qw32hnIlSt2t8nX+R+sLJW 9ypolc/Mh+FWdQD/g0JoOimT0d8D8awF1jia8W8MkRREWuUHFEkoDL4M9t+2yNGhpBsW1cXf MUNzOyP/E6I8GzgEWWovz/ut1j6uWPABCCxytrjGwZw/yqA9yqQjHC8rdzfD2r4rKz8MRpJF qWvL4Arco90so8eJpYLHyTtV7pfVHnWXPktoPTkMqUgQ8twDeywI/VT+sgOIXBP8d5HU0afE bDr+GQARAQABiQI8BBgBCgAmFiEEn9eDGptREvfrWRePQbMyeCIxW8QFAlvNzjsCGwwFCQlm AYAACgkQQbMyeCIxW8TKUhAAuQypJgJ7wIpjzji+Y/2hAhaxEnrCsUjcF6L0b2HSvuy+F2/a kuptJWa2MGUCbzQK5/Ki9S3+s7SxfzjiTk83CB/nKPuMORGly2f84H7fyKWzJCjkxmnV7PnQ iofkJLA6uoxsVR5t72kWL/s/OwcxRP5KJvMVaSUVxWrcStfcc8+FcKetDIqS3u3rHPpJO+uW MFBOJM97Cz2sSMfJ0ZpgIpKDM5Qh/Ak2Fw6dzh99V+mpBVGEL98dAibuzdKbFBZTmgSOgaJT b7i4D0hK2fxyHZE9iiCjqlO8EIGRrjWAr8I5y+d2sDPSkuJUmFXhctRem0do+fNvmUwAyNGD SqWW0mtlTOe1Ur0rcYcgq3owgABOlM6qdbPyZ6sEUbJ0RtR86+ksfhrxtLUf0oz84eqrGjFd NYTVgomeWDuW6q+a5JX4YN9LbYc9OZhjfNTJ0q5K00JvqOHPLYertlF3uscbA+KKhvtZ3x8w o2xAthMRzUGlYy+lHGJ8zUE7PKkeD2rBYfMs7mId7EQ8UNc6ZwvmL0WtZhTiXpmJNAtWBWeG D7zfJr1Z5i/s7tOMyAi6e8J9l5pt6IPa//uByXAGcdOzPADdAxlBLPkJStS/m2RFfDTbdWiZ Flo+wGMdJXNbeLQ4ykdbLbkeTBCH9TSQVmH34GlTT/O6N8xQJF7PSp+Sn0w= Message-ID: Date: Thu, 14 Jan 2021 11:27:31 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="4ldwyV3sVOhqf33rUHmsA3mBAUTupX6Eh" X-CM-TRANSID: DMmowAA3bQEjuv9fLyPSLA--.20591S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7GryxArykKw4UZFW7Xr17GFg_yoW8Jr4UpF Z5t3yDtFsYq348ur93A3WxWFyIy3yFyF45Jr18C34kAFsrGFy0gF9rtryjkryUC3s3A3yU ZFs3Wr95WF9YqaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jkeOJUUUUU= X-Originating-IP: [116.236.172.42] X-CM-SenderInfo: 5okbz0xxvhqiyswou0bp/1tbikwUaRlpEAv0mRAAAsJ X-Spam-Status: No, score=-3124.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jan 2021 03:27:42 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --4ldwyV3sVOhqf33rUHmsA3mBAUTupX6Eh Content-Type: multipart/mixed; boundary="kr5NdhsMIH8MnRiS7oWrQJNKAHoFVQiu5" --kr5NdhsMIH8MnRiS7oWrQJNKAHoFVQiu5 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable =E5=9C=A8 2021/1/14 =E4=B8=8A=E5=8D=889:47, Roy Qu via Gcc =E5=86=99=E9=81= =93: > I use "gcc -finput-charset=3Dutf-8 -fexec-charset=3Dgb2312" to compile= utf-8 > encoding source files under windows. Most of the time it works well, b= ut > when the source file contains some characters such as "=E2=80=94", gcc = will fail > and the error message is: "[Error] converting to execution character se= t: > Illegal byte sequence". >=20 > The attached file is an example. I have tested the file by using iconv = to > convert it from utf-8 to gbk, and iconv works with no complaints. >=20 It looks like this is a bug in iconv. Converting the attached source with= `iconv -f utf-8 -t gb2312 testencoding.cpp` gives the same error. According to the GB2312 code table [1], the EM DASH symbol (U+2014) shoul= d map to the double-byte sequence `A1 AA`. There is no difference among GB2312, GBK and GB18030. Please consider GB2312 superseded by GBK. The native code page (936) refe= rences GBK instead of GB2312. [1] http://www.khngai.com/chinese/charmap/ > So maybe there's something wrong when gcc is trying to do the encoding > conversion? >=20 > Some information: > Toolchain: MinGW-W64-i686, gcc 10.2 > System: Windows 10 Simplified Chinese Home edition ver 2004 >=20 --=20 Best regards, LH_Mouse --kr5NdhsMIH8MnRiS7oWrQJNKAHoFVQiu5-- --4ldwyV3sVOhqf33rUHmsA3mBAUTupX6Eh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEn9eDGptREvfrWRePQbMyeCIxW8QFAl//uiMACgkQQbMyeCIx W8S5WBAA2QAlBGEcOXGfDCM7nxBANwlPneR8tb0ZemnjT0DLXKDAi9nUioQRd/nE Sxauc0n2ge3Jj8rIAXcMMmHsdvxJ4QVDyopgv36QAQY/vLfJkAo9+O/vCzgm765y i0ZDYjF4wbzHJwo+Mzc+1/jB6TmTBGY52XWCFyk395YgJQCGw4z0fyjTaB+k6bed St4EJnyOPQTtcppydYFxx9aB0RlY3LoNOILVebVdHNjHr7tPd9o07ZBTeF8N6toZ Zv8z/cbLGGwVkPUQNxAWA0oN6/n8Jc/3tpyMyIuomjQXN4aRHBstDDZqzANSlMXA gaZGrVqZXS3bVb5BE4/j08WvoG4SE5IxIvfK/JRfXAOf5fkGt/SV2xQ1DF34GkF2 pVO5oGVkfnbi4KwsFi3a7ZRlp/LFQ8B9WCAiX4qq9CeOagCsyJ5dnHvrXqlFApOh FQUsc7NS7BKHj/kQh8txU0FbY0TFdH3SpIve3CR7uXCJbhf3QUiVS1/QlXffs+Qw vMekaz/lxESj4iCNyisEyvu2zWLBZX+msqxnhZs74UATgmtS/brvK0tuJBBEjCmX tI8D5Wz6bTOjvZ1I4j/YT3UpCyl3n0Af9MXh3U2uhqJhszfd6Wn1Laff8fxPidm9 lGaqbKqS4ul2XjRm7PHmrajmCLMgzKVzXrMTWrSnfLnJp2nBlLI= =DoXy -----END PGP SIGNATURE----- --4ldwyV3sVOhqf33rUHmsA3mBAUTupX6Eh--