From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 897CE386F037 for ; Thu, 14 Jan 2021 01:48:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 897CE386F037 Received: by mail-lf1-x136.google.com with SMTP id a12so5737352lfl.6 for ; Wed, 13 Jan 2021 17:48:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=wnrSeKzHFT4lWUWfIHkhtrjv2rBYlrC0C1rHBZXxV+o=; b=XSOJSPDlGi6G1FTGZEdz4E8C1LDGAkLFdfLMnnOyjsaDfmoN26Hb1NgrpMtd6nRfaZ YbYVQOXpyoZQYeU/HyZhGbFm80+goxuHrAk0DmuMYpVrX8Zj2u8nASTR82G0r2rK1uEh vD+GUdgbVsu+rCVi0rk8WdToAo/2g1CXt0UZQFpYQftmKyPXLtsiOjvr91pBo0hE6tgd v1C4DIFDcOLwvUvzIJXibjsqawEjivL/sgGnGj3HRRtnqPB9WdntxzjTqTtiLs4IdoI8 2qhPQwocCOEbD+mHa/gS59j+TjyVK3wd/x3SkxbfZ9Yof4jL5lhubIFOxtaKKc94ovQi pejA== X-Gm-Message-State: AOAM531ksLTTW8mN55tHsey2c6W8DMl3b/ACmBdaTaPRsHpMvvg8g14d zSm4n8YXmRZwVSSUmSJcGhU4DHNeZw30J+WApJFzYpqBtKgftw== X-Google-Smtp-Source: ABdhPJyTZe6WW40KgF743dGNJWq6EVqhG7sLQnEA386W0DfkrA5ZY84I87G6npenskuhlmVs96/D50qfIKmri5KO26s= X-Received: by 2002:ac2:4daa:: with SMTP id h10mr2058362lfe.617.1610588889811; Wed, 13 Jan 2021 17:48:09 -0800 (PST) MIME-Version: 1.0 From: Roy Qu Date: Thu, 14 Jan 2021 09:47:59 +0800 Message-ID: Subject: gcc can't process some utf-8 characters To: gcc@gcc.gnu.org Content-Type: multipart/mixed; boundary="000000000000a8208605b8d27221" X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jan 2021 01:48:13 -0000 --000000000000a8208605b8d27221 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I use "gcc -finput-charset=3Dutf-8 -fexec-charset=3Dgb2312" to compile utf= -8 encoding source files under windows. Most of the time it works well, but when the source file contains some characters such as "=E2=80=94", gcc will= fail and the error message is: "[Error] converting to execution character set: Illegal byte sequence". The attached file is an example. I have tested the file by using iconv to convert it from utf-8 to gbk, and iconv works with no complaints. So maybe there's something wrong when gcc is trying to do the encoding conversion? Some information: Toolchain: MinGW-W64-i686, gcc 10.2 System: Windows 10 Simplified Chinese Home edition ver 2004 --000000000000a8208605b8d27221 Content-Type: application/octet-stream; name="testencoding.cpp" Content-Disposition: attachment; filename="testencoding.cpp" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_kjw6qmbr0 I2luY2x1ZGUgPGlvc3RyZWFtPg0KDQp1c2luZyBuYW1lc3BhY2Ugc3RkOw0KaW50IG1haW4oKXsN CiAgICBjb3V0PDwi5ZOI5ZOI5ZOIXG4iPDxlbmRsOw0KICAgIGNvdXQgPDwgIjHigJTkv67mlLnl kI3lrZcsIDLigJTkv67mlLnlubTpvoQsIDPigJTkv67mlLnlt6XniYzlj7csMOKAlOi/lOWbniIg PDxlbmRsOw0KICAgIHJldHVybiAwOyANCn0NCg== --000000000000a8208605b8d27221--