From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2178) id AED923856DC6; Tue, 17 May 2022 09:57:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AED923856DC6 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Florian Weimer To: glibc-cvs@sourceware.org Subject: [glibc/fw/localedef-utf8] locale: localdef input files are now encoded in UTF-8 X-Act-Checkin: glibc X-Git-Author: Florian Weimer X-Git-Refname: refs/heads/fw/localedef-utf8 X-Git-Oldrev: de0b9d66446c553bdbae2c15a63ef8eb5f819d1d X-Git-Newrev: 0c34593491e4ea2de79ae85fedb26252529b5f35 Message-Id: <20220517095741.AED923856DC6@sourceware.org> Date: Tue, 17 May 2022 09:57:41 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 09:57:41 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0c34593491e4ea2de79ae85fedb26252529b5f35 commit 0c34593491e4ea2de79ae85fedb26252529b5f35 Author: Florian Weimer Date: Tue May 17 11:38:29 2022 +0200 locale: localdef input files are now encoded in UTF-8 Diff: --- locale/programs/linereader.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/locale/programs/linereader.c b/locale/programs/linereader.c index ca4abb031c..485ccaff0a 100644 --- a/locale/programs/linereader.c +++ b/locale/programs/linereader.c @@ -688,7 +688,11 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, buf2 = NULL; while ((ch = lr_getc (lr)) != '"' && ch != '\n' && ch != EOF) - addc (&lrb, ch); + { + if (ch >= 0x80) + lr_error (lr, _("illegal 8-bit character in untranslated string")); + addc (&lrb, ch); + } /* Catch errors with trailing escape character. */ if (lrb.act > 0 && lrb.buf[lrb.act - 1] == lr->escape_char @@ -733,13 +737,35 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (ch == lr->escape_char) { ch = lr_getc (lr); + if (ch >= 0x80) + { + lr_error (lr, _("illegal 8-bit escape sequence")); + illegal_string = true; + break; + } if (ch == '\n' || ch == EOF) break; } + else if (ch < 0x80) + { + wch = ch; + addc (&lrb, ch); + } + else /* UTF-8 sequence. */ + { + if (!get_string_decode_utf8 (lr, ch, &wch)) + { + illegal_string = true; + break; + } + get_string_U_char (locale, charmap, repertoire, wch, + &lrb, &illegal_string); + if (illegal_string) + break; + } - addc (&lrb, ch); if (return_widestr) - ADDWC ((uint32_t) ch); + ADDWC (wch); continue; }