From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8755 invoked by alias); 4 Aug 2010 12:39:42 -0000 Received: (qmail 8706 invoked by uid 48); 4 Aug 2010 12:39:26 -0000 Date: Wed, 04 Aug 2010 12:39:00 -0000 Message-ID: <20100804123926.8705.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug fortran/45179] Support UTF-8 (and other encodings) in the source file (.f90) for CHARACTER(kind=4) In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "burnus at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-08/txt/msg00212.txt.bz2 ------- Comment #2 from burnus at gcc dot gnu dot org 2010-08-04 12:39 ------- Created an attachment (id=21393) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21393&action=view) Support -finput-charset= (accepts option, but does not fix the issue) This patch allows the -finput-charset= but it does not fix the actual problem. scanner.c has: load_line, which works with wide strings, but gets the letters via getc(). Maybe one could implement UTF-8 reading for strings in scanner.c - and use libcpp (-cpp) to convert the input encoding to UTF-8, if it isn't UTF-8. http://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgfortran/io/read.c;hb=HEAD#l236 If one uses CPP, one probably needs to set (internally) wide_charset to UTF-8 such that CPP outputs UTF-8. Possible work plan: a) if -finput-charset= is used but CPP is not used: Allow UTF-8 but print an error for other encodings. Read internally as UTF-8. b) if -finput-charset= is used with -cpp: Allow it and set exec encoding (or whatever it is called) to UTF-8 such that libcpp returns an UTF-8 string to gfortran c) By default continue to use the current means of input, i.e. a simple getc. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45179