public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-7703] gccrs: fix tokenizing utf-8 whitespaces
@ 2024-01-16 17:55 Arthur Cohen
0 siblings, 0 replies; only message in thread
From: Arthur Cohen @ 2024-01-16 17:55 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:84a14f3d88f2568af7922c47f43960bde3904205
commit r14-7703-g84a14f3d88f2568af7922c47f43960bde3904205
Author: Raiki Tamura <tamaron1203@gmail.com>
Date: Wed Jun 28 19:14:50 2023 +0900
gccrs: fix tokenizing utf-8 whitespaces
gcc/rust/ChangeLog:
* lex/rust-lex.cc (Lexer::build_token):add check for all kinds of whitespaces
gcc/testsuite/ChangeLog:
* rust/compile/torture/utf8_whitespaces.rs: New test.
Signed-off-by: Raiki Tamura <tamaron1203@gmail.com>
Diff:
---
gcc/rust/lex/rust-lex.cc | 13 +++++++++++--
gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs | 16 ++++++++++++++++
2 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc
index aec2a96694a..7f7fc0c80bf 100644
--- a/gcc/rust/lex/rust-lex.cc
+++ b/gcc/rust/lex/rust-lex.cc
@@ -420,7 +420,10 @@ Lexer::build_token ()
{
/* ignore whitespace characters for tokens but continue updating
* location */
- case '\n': // newline
+ case '\n': // newline
+ case 0x0085: // next line
+ case 0x2028: // line separator
+ case 0x2029: // paragraph separator
current_line++;
current_column = 1;
// tell line_table that new line starts
@@ -432,10 +435,16 @@ Lexer::build_token ()
case ' ': // space
current_column++;
continue;
- case '\t': // tab
+ case '\t': // horizontal tab
// width of a tab is not well-defined, assume 8 spaces
current_column += 8;
continue;
+ case '\v': // vertical tab
+ case 0x000c: // form feed
+ case 0x200e: // left-to-right mark
+ case 0x200f: // right-to-left mark
+ // Ignored.
+ continue;
// punctuation - actual tokens
case '=':
diff --git a/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs b/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs
new file mode 100644
index 00000000000..b45c014812f
--- /dev/null
+++ b/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs
@@ -0,0 +1,16 @@
+fn main() {
+ // FORM FEED
+ \f
+ // LINE TABULATION (vt)
+ \v
+ // NEXT LINE (nel)
+
+ // LEFT-TO-RIGHT MARK
+
+ // RIGHT-TO-LEFT MARK
+
+ // LINE SEPARATOR
+
+ // PARAGRAPH SEPARATOR
+
+}
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-01-16 17:55 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-16 17:55 [gcc r14-7703] gccrs: fix tokenizing utf-8 whitespaces Arthur Cohen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).