gcc can't process some utf-8 characters

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* gcc can't process some utf-8 characters
@ 2021-01-14  1:47 Roy Qu
  2021-01-14  3:27 ` Liu Hao
  0 siblings, 1 reply; 2+ messages in thread
From: Roy Qu @ 2021-01-14  1:47 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

I use "gcc  -finput-charset=utf-8 -fexec-charset=gb2312" to compile utf-8
encoding source files under  windows. Most of the time it works well, but
when the source file contains some characters such as "—", gcc will fail
and the error message is: "[Error] converting to execution character set:
Illegal byte sequence".

The attached file is an example. I have tested the file by using iconv to
convert it from utf-8 to gbk, and iconv works with no complaints.

So maybe there's something wrong when gcc is trying to do the encoding
conversion?

Some information:
Toolchain: MinGW-W64-i686, gcc 10.2
System: Windows 10 Simplified Chinese Home edition ver 2004

[-- Attachment #2: testencoding.cpp --]
[-- Type: application/octet-stream, Size: 199 bytes --]

#include <iostream>

using namespace std;
int main(){
    cout<<"哈哈哈\n"<<endl;
    cout << "1—修改名字, 2—修改年龄, 3—修改工牌号,0—返回" <<endl;
    return 0; 
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: gcc can't process some utf-8 characters
  2021-01-14  1:47 gcc can't process some utf-8 characters Roy Qu
@ 2021-01-14  3:27 ` Liu Hao
  0 siblings, 0 replies; 2+ messages in thread
From: Liu Hao @ 2021-01-14  3:27 UTC (permalink / raw)
  To: Roy Qu, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1272 bytes --]

在 2021/1/14 上午9:47, Roy Qu via Gcc 写道:
> I use "gcc  -finput-charset=utf-8 -fexec-charset=gb2312" to compile utf-8
> encoding source files under  windows. Most of the time it works well, but
> when the source file contains some characters such as "—", gcc will fail
> and the error message is: "[Error] converting to execution character set:
> Illegal byte sequence".
> 
> The attached file is an example. I have tested the file by using iconv to
> convert it from utf-8 to gbk, and iconv works with no complaints.
> 

It looks like this is a bug in iconv. Converting the attached source with `iconv -f utf-8 -t gb2312
testencoding.cpp` gives the same error.

According to the GB2312 code table [1], the EM DASH symbol (U+2014) should map to the double-byte
sequence `A1 AA`. There is no difference among GB2312, GBK and GB18030.

Please consider GB2312 superseded by GBK. The native code page (936) references GBK instead of GB2312.


[1] http://www.khngai.com/chinese/charmap/


> So maybe there's something wrong when gcc is trying to do the encoding
> conversion?
> 
> Some information:
> Toolchain: MinGW-W64-i686, gcc 10.2
> System: Windows 10 Simplified Chinese Home edition ver 2004
> 


-- 
Best regards,
LH_Mouse


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-14  3:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14  1:47 gcc can't process some utf-8 characters Roy Qu
2021-01-14  3:27 ` Liu Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).