public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-7703] gccrs: fix tokenizing utf-8 whitespaces
@ 2024-01-16 17:55 Arthur Cohen
  0 siblings, 0 replies; only message in thread
From: Arthur Cohen @ 2024-01-16 17:55 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:84a14f3d88f2568af7922c47f43960bde3904205

commit r14-7703-g84a14f3d88f2568af7922c47f43960bde3904205
Author: Raiki Tamura <tamaron1203@gmail.com>
Date:   Wed Jun 28 19:14:50 2023 +0900

    gccrs: fix tokenizing utf-8 whitespaces
    
    gcc/rust/ChangeLog:
    
            * lex/rust-lex.cc (Lexer::build_token):add check for all kinds of whitespaces
    
    gcc/testsuite/ChangeLog:
    
            * rust/compile/torture/utf8_whitespaces.rs: New test.
    
    Signed-off-by: Raiki Tamura <tamaron1203@gmail.com>

Diff:
---
 gcc/rust/lex/rust-lex.cc                               | 13 +++++++++++--
 gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs | 16 ++++++++++++++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc
index aec2a96694a..7f7fc0c80bf 100644
--- a/gcc/rust/lex/rust-lex.cc
+++ b/gcc/rust/lex/rust-lex.cc
@@ -420,7 +420,10 @@ Lexer::build_token ()
 	{
 	/* ignore whitespace characters for tokens but continue updating
 	 * location */
-	case '\n': // newline
+	case '\n':   // newline
+	case 0x0085: // next line
+	case 0x2028: // line separator
+	case 0x2029: // paragraph separator
 	  current_line++;
 	  current_column = 1;
 	  // tell line_table that new line starts
@@ -432,10 +435,16 @@ Lexer::build_token ()
 	case ' ': // space
 	  current_column++;
 	  continue;
-	case '\t': // tab
+	case '\t': // horizontal tab
 	  // width of a tab is not well-defined, assume 8 spaces
 	  current_column += 8;
 	  continue;
+	case '\v':   // vertical tab
+	case 0x000c: // form feed
+	case 0x200e: // left-to-right mark
+	case 0x200f: // right-to-left mark
+	  // Ignored.
+	  continue;
 
 	// punctuation - actual tokens
 	case '=':
diff --git a/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs b/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs
new file mode 100644
index 00000000000..b45c014812f
--- /dev/null
+++ b/gcc/testsuite/rust/compile/torture/utf8_whitespaces.rs
@@ -0,0 +1,16 @@
+fn main() {
+    // FORM FEED
+    \f
+    // LINE TABULATION (vt)
+    \v
+    // NEXT LINE (nel)
+    …
+    // LEFT-TO-RIGHT MARK
+    ‎
+    // RIGHT-TO-LEFT MARK 
+    ‏
+    // LINE SEPARATOR
+    

+    // PARAGRAPH SEPARATOR 
+    

+}

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-01-16 17:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-16 17:55 [gcc r14-7703] gccrs: fix tokenizing utf-8 whitespaces Arthur Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).