From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <royqh1979@gmail.com>
Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com
 [IPv6:2a00:1450:4864:20::136])
 by sourceware.org (Postfix) with ESMTPS id 897CE386F037
 for <gcc@gcc.gnu.org>; Thu, 14 Jan 2021 01:48:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 897CE386F037
Received: by mail-lf1-x136.google.com with SMTP id a12so5737352lfl.6
 for <gcc@gcc.gnu.org>; Wed, 13 Jan 2021 17:48:11 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=wnrSeKzHFT4lWUWfIHkhtrjv2rBYlrC0C1rHBZXxV+o=;
 b=XSOJSPDlGi6G1FTGZEdz4E8C1LDGAkLFdfLMnnOyjsaDfmoN26Hb1NgrpMtd6nRfaZ
 YbYVQOXpyoZQYeU/HyZhGbFm80+goxuHrAk0DmuMYpVrX8Zj2u8nASTR82G0r2rK1uEh
 vD+GUdgbVsu+rCVi0rk8WdToAo/2g1CXt0UZQFpYQftmKyPXLtsiOjvr91pBo0hE6tgd
 v1C4DIFDcOLwvUvzIJXibjsqawEjivL/sgGnGj3HRRtnqPB9WdntxzjTqTtiLs4IdoI8
 2qhPQwocCOEbD+mHa/gS59j+TjyVK3wd/x3SkxbfZ9Yof4jL5lhubIFOxtaKKc94ovQi
 pejA==
X-Gm-Message-State: AOAM531ksLTTW8mN55tHsey2c6W8DMl3b/ACmBdaTaPRsHpMvvg8g14d
 zSm4n8YXmRZwVSSUmSJcGhU4DHNeZw30J+WApJFzYpqBtKgftw==
X-Google-Smtp-Source: ABdhPJyTZe6WW40KgF743dGNJWq6EVqhG7sLQnEA386W0DfkrA5ZY84I87G6npenskuhlmVs96/D50qfIKmri5KO26s=
X-Received: by 2002:ac2:4daa:: with SMTP id h10mr2058362lfe.617.1610588889811; 
 Wed, 13 Jan 2021 17:48:09 -0800 (PST)
MIME-Version: 1.0
From: Roy Qu <royqh1979@gmail.com>
Date: Thu, 14 Jan 2021 09:47:59 +0800
Message-ID: <CAKjWZNXrBreBV+3M1M-GY6NEzccPbepmtUydZ1-71u+_bppUjw@mail.gmail.com>
Subject: gcc can't process some utf-8 characters
To: gcc@gcc.gnu.org
Content-Type: multipart/mixed; boundary="000000000000a8208605b8d27221"
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Jan 2021 01:48:13 -0000

--000000000000a8208605b8d27221
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I use "gcc  -finput-charset=3Dutf-8 -fexec-charset=3Dgb2312" to compile utf=
-8
encoding source files under  windows. Most of the time it works well, but
when the source file contains some characters such as "=E2=80=94", gcc will=
 fail
and the error message is: "[Error] converting to execution character set:
Illegal byte sequence".

The attached file is an example. I have tested the file by using iconv to
convert it from utf-8 to gbk, and iconv works with no complaints.

So maybe there's something wrong when gcc is trying to do the encoding
conversion?

Some information:
Toolchain: MinGW-W64-i686, gcc 10.2
System: Windows 10 Simplified Chinese Home edition ver 2004

--000000000000a8208605b8d27221
Content-Type: application/octet-stream; name="testencoding.cpp"
Content-Disposition: attachment; filename="testencoding.cpp"
Content-Transfer-Encoding: base64
Content-ID: <f_kjw6qmbr0>
X-Attachment-Id: f_kjw6qmbr0

I2luY2x1ZGUgPGlvc3RyZWFtPg0KDQp1c2luZyBuYW1lc3BhY2Ugc3RkOw0KaW50IG1haW4oKXsN
CiAgICBjb3V0PDwi5ZOI5ZOI5ZOIXG4iPDxlbmRsOw0KICAgIGNvdXQgPDwgIjHigJTkv67mlLnl
kI3lrZcsIDLigJTkv67mlLnlubTpvoQsIDPigJTkv67mlLnlt6XniYzlj7csMOKAlOi/lOWbniIg
PDxlbmRsOw0KICAgIHJldHVybiAwOyANCn0NCg==
--000000000000a8208605b8d27221--