On 07/08/2021 16:45, Mark Wielaard wrote: > Zero characters (codepoints) are acceptable in strings. The current > Lexer::parse_string skipped such zero codepoints by accidents. The > zero codepoint was also used as error/skip indicator, but that is only > true if the third argument of utf8_escape_pair is true (yes, it is > called pair, but is a triple). > > Add a testcase that checks the (sub)strings are separated by zero > chars. Since we cannot slice strings yet this uses extern "C" > functions, printf and memchr. > --- > > On irc bjorn3_gh pointed out that our lexer ate embedded zero chars > from strings. This fixes that issue and adds a testcase. Also on > https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=str-zero > > gcc/rust/lex/rust-lex.cc | 2 +- > .../rust/execute/torture/str-zero.rs | 26 +++++++++++++++++++ > 2 files changed, 27 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/rust/execute/torture/str-zero.rs > > diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc > index 0b8a8eae651..2cfbc4fb1f4 100644 > --- a/gcc/rust/lex/rust-lex.cc > +++ b/gcc/rust/lex/rust-lex.cc > @@ -1827,7 +1827,7 @@ Lexer::parse_string (Location loc) > else > length += std::get<1> (utf8_escape_pair); > > - if (current_char32 != Codepoint (0)) > + if (current_char32 != Codepoint (0) || !std::get<2> (utf8_escape_pair)) > str += current_char32; > > // required as parsing utf8 escape only changes current_char > diff --git a/gcc/testsuite/rust/execute/torture/str-zero.rs b/gcc/testsuite/rust/execute/torture/str-zero.rs > new file mode 100644 > index 00000000000..e7fba0d1372 > --- /dev/null > +++ b/gcc/testsuite/rust/execute/torture/str-zero.rs > @@ -0,0 +1,26 @@ > +/* { dg-output "bar foo baz foobar\n" } */ > +extern "C" > +{ > + fn printf(s: *const i8, ...); > + fn memchr(s: *const i8, c: u8, n: usize) -> *const i8; > +} > + > +pub fn main () -> i32 > +{ > + let f = "%s %s %s %s\n\0"; > + let s = "bar\0\ > + foo\ > + \x00\ > + baz\u{0000}\ > + foobar\0"; > + let cf = f as *const str as *const i8; > + let cs = s as *const str as *const i8; > + unsafe > + { > + let cs2 = memchr (cs, b'f', 5); > + let cs3 = memchr (cs2, b'b', 5); > + let cs4 = memchr (cs3, b'f', 5); > + printf (cf, cs, cs2, cs3, cs4); > + } > + 0 > +} Hi Mark, This patch looks good to go but the clang-format check is failing: https://github.com/Rust-GCC/gccrs/pull/615 The error seems to be that it moves the extra check onto a new line. https://github.com/Rust-GCC/gccrs/pull/615/checks?check_run_id=3272823975 Thanks --Phil