From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 5FDAE3858023 for ; Sun, 8 Aug 2021 12:36:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5FDAE3858023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-wr1-x435.google.com with SMTP id h13so17551782wrp.1 for ; Sun, 08 Aug 2021 05:36:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=tqk8rDw1VfRVTrbhyt+sN1QDSUroOrwkNGj/553gMrI=; b=eTbWuH77sbuo4SU3s9+j8x/zDPGd1edFBWESXrkR0hfsGXQf20ghHPmIW8tj+qCoqG id7Z1HuQFFgFcUASAZ3X6UI2pxVd0h78jVhPeRuULxBI0aN4xKUxbSTaLud91VhXDW6y 4tzFwFthY/O1Y0qFJeVgPTPY5CZTmMDYPQ9V4zrv19XpWEhkff9JmmHC5NIONvspirtK 1DNYKyXoAR7ckcMuG0p1f4ukALpkkUvNSjYUH6ha3GlKaFPDduNY5rDdWW/kKMd+hT10 gXHq32Ptmu/O88YJs70ClqzwB2RYone9JVF/6vNR86fGq4j6wZvIlGhhUiWe2BuUpy2J 1smQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=tqk8rDw1VfRVTrbhyt+sN1QDSUroOrwkNGj/553gMrI=; b=NqTuVawIQ8UnEiLMFKX/IUNVRxbfDynvW58sZwYduBps5xvfhShgiRc4zbawq96gqO LGLvszSE8p1azxx4YW/stoyqAyK5Nfb/5AzwZDLyAHPd615dUq5BgBDWzigmEswQdUNS lbW1dXB75V8wmDh8jV8qx3Ql8J4uMynz2t6wxkraLDsNyRd0uyx5g5cXs2tM2TaRuYea QdeWfUx0maxoMdIdwsfCxQS44fNB/a59bCeEcJyk9SRX6rjuOY20WymUYRx/hhjO2kVC Q8eLy1btzQzALfAgMtQe4dNJaaLLyiimr4C1iXMUQyL7L54lpIlakxRrbusn3A2C8hg9 oUgA== X-Gm-Message-State: AOAM532fsHOMvcIONb1XyTGjBeU/SvChhA7sSq6wV8v0y0f02dHGSpfY Tq9I2cnmw98kQRppzCBXs5RL0qYxhjmtng== X-Google-Smtp-Source: ABdhPJxbO3F3DwZfGADjlSFkqbRGm3Y+tTcRW87nwBeyNWEo8F+TyS2P3K78CKPPcuGU8k03nShi9A== X-Received: by 2002:adf:ba0d:: with SMTP id o13mr19319852wrg.134.1628426183003; Sun, 08 Aug 2021 05:36:23 -0700 (PDT) Received: from [192.168.0.40] ([86.14.124.218]) by smtp.gmail.com with ESMTPSA id c190sm14572028wma.21.2021.08.08.05.36.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 08 Aug 2021 05:36:22 -0700 (PDT) Subject: Re: [PATCH] lex: accept zero codepoints in strings To: gcc-rust@gcc.gnu.org References: <20210807154553.441960-1-mark@klomp.org> From: Philip Herron Message-ID: <3ebee1a8-2c21-41e2-18e8-9b4248ef1551@embecosm.com> Date: Sun, 8 Aug 2021 13:36:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210807154553.441960-1-mark@klomp.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9V7tdluCxIWXqhvs9EjnN0Zj8kGwXPRmu" X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Aug 2021 12:36:26 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --9V7tdluCxIWXqhvs9EjnN0Zj8kGwXPRmu Content-Type: multipart/mixed; boundary="GMDU6Y2sbN316HWscxbBcqTnMTHBxYpmj"; protected-headers="v1" From: Philip Herron To: gcc-rust@gcc.gnu.org Message-ID: <3ebee1a8-2c21-41e2-18e8-9b4248ef1551@embecosm.com> Subject: Re: [PATCH] lex: accept zero codepoints in strings References: <20210807154553.441960-1-mark@klomp.org> In-Reply-To: <20210807154553.441960-1-mark@klomp.org> --GMDU6Y2sbN316HWscxbBcqTnMTHBxYpmj Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US On 07/08/2021 16:45, Mark Wielaard wrote: > Zero characters (codepoints) are acceptable in strings. The current > Lexer::parse_string skipped such zero codepoints by accidents. The > zero codepoint was also used as error/skip indicator, but that is only > true if the third argument of utf8_escape_pair is true (yes, it is > called pair, but is a triple). > > Add a testcase that checks the (sub)strings are separated by zero > chars. Since we cannot slice strings yet this uses extern "C" > functions, printf and memchr. > --- > > On irc bjorn3_gh pointed out that our lexer ate embedded zero chars > from strings. This fixes that issue and adds a testcase. Also on > https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=3Dstr-zero > > gcc/rust/lex/rust-lex.cc | 2 +- > .../rust/execute/torture/str-zero.rs | 26 +++++++++++++++++++= > 2 files changed, 27 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/rust/execute/torture/str-zero.rs > > diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc > index 0b8a8eae651..2cfbc4fb1f4 100644 > --- a/gcc/rust/lex/rust-lex.cc > +++ b/gcc/rust/lex/rust-lex.cc > @@ -1827,7 +1827,7 @@ Lexer::parse_string (Location loc) > else > length +=3D std::get<1> (utf8_escape_pair); > =20 > - if (current_char32 !=3D Codepoint (0)) > + if (current_char32 !=3D Codepoint (0) || !std::get<2> (utf8_escape_= pair)) > str +=3D current_char32; > =20 > // required as parsing utf8 escape only changes current_char > diff --git a/gcc/testsuite/rust/execute/torture/str-zero.rs b/gcc/tests= uite/rust/execute/torture/str-zero.rs > new file mode 100644 > index 00000000000..e7fba0d1372 > --- /dev/null > +++ b/gcc/testsuite/rust/execute/torture/str-zero.rs > @@ -0,0 +1,26 @@ > +/* { dg-output "bar foo baz foobar\n" } */ > +extern "C" > +{ > + fn printf(s: *const i8, ...); > + fn memchr(s: *const i8, c: u8, n: usize) -> *const i8; > +} > + > +pub fn main () -> i32 > +{ > + let f =3D "%s %s %s %s\n\0"; > + let s =3D "bar\0\ > + foo\ > + \x00\ > + baz\u{0000}\ > + foobar\0"; > + let cf =3D f as *const str as *const i8; > + let cs =3D s as *const str as *const i8; > + unsafe > + { > + let cs2 =3D memchr (cs, b'f', 5); > + let cs3 =3D memchr (cs2, b'b', 5); > + let cs4 =3D memchr (cs3, b'f', 5); > + printf (cf, cs, cs2, cs3, cs4); > + } > + 0 > +} Hi Mark, This patch looks good to go but the clang-format check is failing: https://github.com/Rust-GCC/gccrs/pull/615 The error seems to be that it moves the extra check onto a new line. https://github.com/Rust-GCC/gccrs/pull/615/checks?check_run_id=3D32728239= 75 Thanks --Phil --GMDU6Y2sbN316HWscxbBcqTnMTHBxYpmj-- --9V7tdluCxIWXqhvs9EjnN0Zj8kGwXPRmu Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsD5BAABCAAjFiEET83ATZOayqRjyL0Cr7gxHEFOdpkFAmEPz8UFAwAAAAAACgkQr7gxHEFOdpm6 XQwAqRLIdbzG7ODeZRdEIrtCFZgYKADMnUEB/gIyvEIuoKmEXxaTNkDIsyFOb9lljm1cT15wY/02 IF9vLubsrgDPrsB5hcfCPqyFw4fdE6YOqnxglo7xI4AkiNq9hlhJaeezLlA41RWGySTiUuWtpNhF OvXyy4l1nsVVnsJ7GgbKpIkLEf5D9ryn0gawovbsk+cq+/PsitsAlN87bvTKs/nIcQPAbL1TYPSo CeCJeyvthFqAMXyFqudPinpr+WA3p9kbv8CbH5diNIgEd6cXNvGDJ9lA6efj9kmSM/WsXnjscqpp 5eGPyCBk6d+poj0oARZfF9qTHH7fO/r7N+nNQyeba7VMtopByDgQEqRwbCW8m4/WpBDW+UxLo59o J3t6hwcJFEbgK5ashyCyVu/6923HCvyBKQx4A2hXx4gsrEhumACdzvccmWKrmkOdZjs+lgfpX4YF zowaPHGJRnQTvTVw6Xwy5VRXDXEhyX/MFW33V5ugRSXDWTs+IVgblNfz2bUR =g9is -----END PGP SIGNATURE----- --9V7tdluCxIWXqhvs9EjnN0Zj8kGwXPRmu--