From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id 91F3A3858023 for ; Sun, 8 Aug 2021 13:05:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 91F3A3858023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x03.wildebeest.org [172.31.17.133]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id AD9E730291A9; Sun, 8 Aug 2021 15:05:35 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 15E082E8070E; Sun, 8 Aug 2021 15:05:35 +0200 (CEST) Date: Sun, 8 Aug 2021 15:05:35 +0200 From: Mark Wielaard To: Philip Herron Cc: gcc-rust@gcc.gnu.org Subject: Re: [PATCH] lex: accept zero codepoints in strings Message-ID: References: <20210807154553.441960-1-mark@klomp.org> <3ebee1a8-2c21-41e2-18e8-9b4248ef1551@embecosm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="5h/fQsiz3mI5oKxz" Content-Disposition: inline In-Reply-To: <3ebee1a8-2c21-41e2-18e8-9b4248ef1551@embecosm.com> X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Aug 2021 13:05:39 -0000 --5h/fQsiz3mI5oKxz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Philip On Sun, Aug 08, 2021 at 01:36:21PM +0100, Philip Herron wrote: > This patch looks good to go but the clang-format check is failing: > https://github.com/Rust-GCC/gccrs/pull/615 > > The error seems to be that it moves the extra check onto a new line. > https://github.com/Rust-GCC/gccrs/pull/615/checks?check_run_id=3272823975 Unfortunately it doesn't show what the actual formatting isssue is. It just say "Sign in for full log view". But I get the idea by running it locally. The line is too long. Updated patch attached and pushed to https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=str-zero Cheers, Mark --5h/fQsiz3mI5oKxz Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-lex-accept-zero-codepoints-in-strings.patch" >From 007e6ecefb0b43d0b9e7bf85f75ec050b5c520e5 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Sat, 7 Aug 2021 17:32:41 +0200 Subject: [PATCH] lex: accept zero codepoints in strings Zero characters (codepoints) are acceptable in strings. The current Lexer::parse_string skipped such zero codepoints by accidents. The zero codepoint was also used as error/skip indicator, but that is only true if the third argument of utf8_escape_pair is true (yes, it is called pair, but is a triple). Add a testcase that checks the (sub)strings are separated by zero chars. Since we cannot slice strings yet this uses extern "C" functions, printf and memchr. --- gcc/rust/lex/rust-lex.cc | 3 ++- .../rust/execute/torture/str-zero.rs | 26 +++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/rust/execute/torture/str-zero.rs diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc index 0b8a8eae651..49b6b6d32a7 100644 --- a/gcc/rust/lex/rust-lex.cc +++ b/gcc/rust/lex/rust-lex.cc @@ -1827,7 +1827,8 @@ Lexer::parse_string (Location loc) else length += std::get<1> (utf8_escape_pair); - if (current_char32 != Codepoint (0)) + if (current_char32 != Codepoint (0) + || !std::get<2> (utf8_escape_pair)) str += current_char32; // required as parsing utf8 escape only changes current_char diff --git a/gcc/testsuite/rust/execute/torture/str-zero.rs b/gcc/testsuite/rust/execute/torture/str-zero.rs new file mode 100644 index 00000000000..e7fba0d1372 --- /dev/null +++ b/gcc/testsuite/rust/execute/torture/str-zero.rs @@ -0,0 +1,26 @@ +/* { dg-output "bar foo baz foobar\n" } */ +extern "C" +{ + fn printf(s: *const i8, ...); + fn memchr(s: *const i8, c: u8, n: usize) -> *const i8; +} + +pub fn main () -> i32 +{ + let f = "%s %s %s %s\n\0"; + let s = "bar\0\ + foo\ + \x00\ + baz\u{0000}\ + foobar\0"; + let cf = f as *const str as *const i8; + let cs = s as *const str as *const i8; + unsafe + { + let cs2 = memchr (cs, b'f', 5); + let cs3 = memchr (cs2, b'b', 5); + let cs4 = memchr (cs3, b'f', 5); + printf (cf, cs, cs2, cs3, cs4); + } + 0 +} -- 2.32.0 --5h/fQsiz3mI5oKxz--