From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id A3DD13858415 for ; Thu, 30 Sep 2021 09:32:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3DD13858415 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-ed1-x52d.google.com with SMTP id g7so19568442edv.1 for ; Thu, 30 Sep 2021 02:32:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j2ilm4N8gL/0HP4i7yc6RZRe0tA2oJtaXsyvOIODtyY=; b=Qn+O9NB6Q1uIjlnr7NnLShWu4gCciBzeFNRvf5oDXdeJwFSfUAwyKdbg1wKwH++ADL OqKkx7aFYfHQXcOknPzzALoVNgEn72SRnBVwDrwz0ysGQlbs2FitXTCtPoFMitAHkxBz oUK1poo5+Ud+EHGMNw5IihWuTgnpcwdlrKxvaxydxcrNE1tT0ux59j07PYzcb4Ufs7XR 3/MPPZa9a23Cjpa5+fK2MueqnwaMGcC6QTWoaOoV9cFceAOGNLJbGG35v2RRQRKToyyR i7y7nbFvMzTnrykbdyX6MrIx3wpBrKAOeT4KgmgMLonbI9nF71XWoFsC84Hq/Bwsx5dA SULQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j2ilm4N8gL/0HP4i7yc6RZRe0tA2oJtaXsyvOIODtyY=; b=hO+GpsAbQHU3C7Hp09clCidO0rGPsYUazviROWCsDjnN+KZ1E5AM9OpVJox85USOVT 9sj1Y5tENo5JRRx0ryTzvkSyJoNbibZPBSnui+0NCoWStdukAfDuQ520ipnSFngnb7ay c2DZN0YG9KfoaT0I+tPkY3MgE8euWXjzM4cHxS5598n3InG7XLuU5LVKn0h0QCqfdk77 9ZlsTgXOZh++ad4Q0SMTBfPzQ0xKPSj0aYemFLcHCExZ4aaoERt2mbNH6a3xAX9u2m7O wAuBEHWpVwUwlgCjlrVJAOc3zCN0FwEDH1edCDbi4TVtcYKgWgFnoJwFz5Erf5YdnvHY 1ZSQ== X-Gm-Message-State: AOAM53206Ynfa3vmHiN5EsNG0MEpFL51DBH/5jo4Kx1ua4Ep6sL5lmdc XgxT8gikGzyzQxRxxfgo+B7BMDQqzhUUwfMEcyZ4TS3Kh4ke1g== X-Google-Smtp-Source: ABdhPJzBVzAlCoRZh/Cdu8qy8lOaz9/Hu1w8xK6W9FoO0xD6gkKFFY5LgQsVtKem6tvhj8jtZpO/B3Q8t8ULUDfqmLQ= X-Received: by 2002:aa7:d78e:: with SMTP id s14mr5702924edq.171.1632994377517; Thu, 30 Sep 2021 02:32:57 -0700 (PDT) MIME-Version: 1.0 References: <20210929203429.563311-1-mark@klomp.org> In-Reply-To: <20210929203429.563311-1-mark@klomp.org> From: Philip Herron Date: Thu, 30 Sep 2021 10:32:46 +0100 Message-ID: Subject: Re: [PATCH] Fix raw byte string parsing of zero and out of range bytes To: Mark Wielaard Cc: gcc-rust@gcc.gnu.org Content-Type: multipart/alternative; boundary="000000000000cadc3505cd3321e0" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2021 09:33:00 -0000 --000000000000cadc3505cd3321e0 Content-Type: text/plain; charset="UTF-8" Hi Mark, This looks good and it is currently being merged: https://github.com/Rust-GCC/gccrs/pull/695, I will catch up with your other patches through the day. Thanks --Phil On Wed, 29 Sept 2021 at 21:35, Mark Wielaard wrote: > Allow \0 escape in raw byte string and reject non-ascii byte > values. Change parse_partial_hex_escapes to not skip bad characters to > provide better error messages. > > Add rawbytestring.rs testcase to check string, raw string, byte string > and raw byte string parsing. > --- > > > https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=parse-raw-byte-string > > gcc/rust/lex/rust-lex.cc | 20 +++++++++++++++----- > gcc/testsuite/rust/compile/rawbytestring.rs | Bin 0 -> 3234 bytes > 2 files changed, 15 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/rust/compile/rawbytestring.rs > > diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc > index b70877be9ff..bbddea04d0c 100644 > --- a/gcc/rust/lex/rust-lex.cc > +++ b/gcc/rust/lex/rust-lex.cc > @@ -1423,8 +1423,7 @@ Lexer::parse_partial_hex_escape () > char hexNum[3] = {0, 0, 0}; > > // first hex char > - skip_input (); > - current_char = peek_input (); > + current_char = peek_input (1); > int additional_length_offset = 1; > > if (!is_x_digit (current_char)) > @@ -1432,20 +1431,23 @@ Lexer::parse_partial_hex_escape () > rust_error_at (get_current_location (), > "invalid character %<\\x%c%> in \\x sequence", > current_char); > + return std::make_pair (0, 0); > } > hexNum[0] = current_char; > > // second hex char > skip_input (); > - current_char = peek_input (); > + current_char = peek_input (1); > additional_length_offset++; > > if (!is_x_digit (current_char)) > { > rust_error_at (get_current_location (), > - "invalid character %<\\x%c%> in \\x sequence", > + "invalid character %<\\x%c%c%> in \\x sequence", > hexNum[0], > current_char); > + return std::make_pair (0, 1); > } > + skip_input (); > hexNum[1] = current_char; > > long hexLong = std::strtol (hexNum, nullptr, 16); > @@ -1627,7 +1629,7 @@ Lexer::parse_byte_string (Location loc) > else > length += std::get<1> (escape_length_pair); > > - if (output_char != 0) > + if (output_char != 0 || !std::get<2> (escape_length_pair)) > str += output_char; > > continue; > @@ -1722,6 +1724,14 @@ Lexer::parse_raw_byte_string (Location loc) > } > } > > + if ((unsigned char) current_char > 127) > + { > + rust_error_at (get_current_location (), > + "character %<%c%> in raw byte string out of > range", > + current_char); > + current_char = 0; > + } > + > length++; > > str += current_char; > diff --git a/gcc/testsuite/rust/compile/rawbytestring.rs > b/gcc/testsuite/rust/compile/rawbytestring.rs > new file mode 100644 > index > 0000000000000000000000000000000000000000..9c6b762a7fd378206a3bfe21db5b708890f5466f > GIT binary patch > literal 3234 > zcmbVOO>fgc5amjL#mG4T6)1rVl`5`1a^M^Zc^f;m2zI;c($Wfvf5=~A-pqd4PU65Z > z<9Tmp-n`vS-O~56Y3cQwv*$CS<&wUX59E5=v|Go4UDeZ9>ni$0wkR&M$OnWLi=tR8 > zvhYe0>#n0kp8Z~u3yqU0ZIOclm4066_dMaIdKBLE zC8SmEy1cFEe3@FkZM6EI->90VG(Y=lGOCeT&0tuLp+z$p*Er0}$>V{I!^8|YFtTxx > z@X*l4>D0{rUt=35bE5|t9J_s{&GuboZD* zc%EYYDlHYkjRXh2@TuH1=aM+;Gqyvv;&mx;T8-!69@lIn-t3a5*&*G8KFpvI38NDZ > zXRR0;(@$zf^Mz-&9d5I9*G z&}eCgAm&LiE1Ztuo)19+f+2(Qu2!m#_4vb;|G-CZh`7Kh;Epf$lpotHpVZa9RxPJ` > z*!LJ1N@A_5D-K2!c50h=c|}nH2&&HIi=qJdndb5#r=*{jFDfG+GVizjpnnJPCErUm > z(~py#01%ck2asI=5SB3ogcab#=?eJBr4{72%hYuq1akuw_HYtNmI2frgB`4tK#Ur; > zF6x6XH@P+_Ld&Pj=Khmtif_<#%m^#v8|1@fDley8DqbpRJ8##35S;)CLQU5(s-g1& > zGH1b11D=)VWpyG#bwi0++xi+Rry%Bx8xX28AhXsD5b>@|GH+hjmxkvqUg|FRiNY&& > zP6(%e4anlh3W@7JWKOd9E)p`E*!Jfr70=|kILq??taYC%vd4tW9O052 zQXTCpCkvg$z92;~u`^Cz|8;Ub$4U{W)axrh#`o>FwtHy(W1l^5)-!Q6rkeY2pcOb5 > zC5~Q^#`Cf!S&N9GM~?nelacMDHe;2ejYcV-D%(M~7r|5FJ&7hOIQ$Oo!_o8>9i>^x > zV>X-Uc!7Jfq5+jI?0J=nn!sj`v1wMcU}PH?O>D8bJ*^GcSU|ak{Lxs!g8aAiFN}Pz > A0RR91 > > literal 0 > HcmV?d00001 > > -- > 2.32.0 > > -- > Gcc-rust mailing list > Gcc-rust@gcc.gnu.org > https://gcc.gnu.org/mailman/listinfo/gcc-rust > --000000000000cadc3505cd3321e0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Mark,

This looks good and= it is currently being merged: https://github.com/Rust-GCC/gccrs/pull/695, I will catch up = with your other patches through the day.

Thank= s

--Phil

On Wed, 29 Sept 2021 at 21:35, M= ark Wielaard <mark@klomp.org> w= rote:
Allow \0 e= scape in raw byte string and reject non-ascii byte
values. Change parse_partial_hex_escapes to not skip bad characters to
provide better error messages.

Add rawbytestring.rs testcase to check string, raw string, byte string and raw byte string parsing.
---

https://code.wildebe= est.org/git/user/mjw/gccrs/commit/?h=3Dparse-raw-byte-string

=C2=A0gcc/rust/lex/rust-lex.cc=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 20 +++++++++++++++-----
=C2=A0gcc/testsuite/rust/compile/rawbytestring.rs | Bin 0 -> 3234 byte= s
=C2=A02 files changed, 15 insertions(+), 5 deletions(-)
=C2=A0create mode 100644 gcc/testsuite/rust/compile/rawbytestring.rs

diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc
index b70877be9ff..bbddea04d0c 100644
--- a/gcc/rust/lex/rust-lex.cc
+++ b/gcc/rust/lex/rust-lex.cc
@@ -1423,8 +1423,7 @@ Lexer::parse_partial_hex_escape ()
=C2=A0 =C2=A0char hexNum[3] =3D {0, 0, 0};

=C2=A0 =C2=A0// first hex char
-=C2=A0 skip_input ();
-=C2=A0 current_char =3D peek_input ();
+=C2=A0 current_char =3D peek_input (1);
=C2=A0 =C2=A0int additional_length_offset =3D 1;

=C2=A0 =C2=A0if (!is_x_digit (current_char))
@@ -1432,20 +1431,23 @@ Lexer::parse_partial_hex_escape ()
=C2=A0 =C2=A0 =C2=A0 =C2=A0rust_error_at (get_current_location (),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0"invalid character %<\\x%c%> in \\x sequence",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0current_char);
+=C2=A0 =C2=A0 =C2=A0 return std::make_pair (0, 0);
=C2=A0 =C2=A0 =C2=A0}
=C2=A0 =C2=A0hexNum[0] =3D current_char;

=C2=A0 =C2=A0// second hex char
=C2=A0 =C2=A0skip_input ();
-=C2=A0 current_char =3D peek_input ();
+=C2=A0 current_char =3D peek_input (1);
=C2=A0 =C2=A0additional_length_offset++;

=C2=A0 =C2=A0if (!is_x_digit (current_char))
=C2=A0 =C2=A0 =C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0rust_error_at (get_current_location (),
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &quo= t;invalid character %<\\x%c%> in \\x sequence",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &quo= t;invalid character %<\\x%c%c%> in \\x sequence", hexNum[0],
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0current_char);
+=C2=A0 =C2=A0 =C2=A0 return std::make_pair (0, 1);
=C2=A0 =C2=A0 =C2=A0}
+=C2=A0 skip_input ();
=C2=A0 =C2=A0hexNum[1] =3D current_char;

=C2=A0 =C2=A0long hexLong =3D std::strtol (hexNum, nullptr, 16);
@@ -1627,7 +1629,7 @@ Lexer::parse_byte_string (Location loc)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 length +=3D std::get<1> (es= cape_length_pair);

-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (output_char !=3D 0)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (output_char !=3D 0 || !std::get<2= > (escape_length_pair))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 str +=3D output_char;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 continue;
@@ -1722,6 +1724,14 @@ Lexer::parse_raw_byte_string (Location loc)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }

+=C2=A0 =C2=A0 =C2=A0 if ((unsigned char) current_char > 127)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rust_error_at (get_current_location (),<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "character %<%c%> in raw byte string out of range&quo= t;,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 current_char);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0current_char =3D 0;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0length++;

=C2=A0 =C2=A0 =C2=A0 =C2=A0str +=3D current_char;
diff --git a/gcc/testsuite/rust/compile/rawbytestring.rs b/gcc/testsuite/= rust/compile/rawbytestring.rs
new file mode 100644
index 0000000000000000000000000000000000000000..9c6b762a7fd378206a3bfe21db5= b708890f5466f
GIT binary patch
literal 3234
zcmbVOO>fgc5amjL#mG4T6)1rVl`5`1a^M^Zc^f;m2zI;c($Wfvf5=3D~A-pqd4PU65Z
z<9Tmp-n`vS-O~56Y3cQwv*$CS<&wUX59E5=3Dv|Go4UDeZ9>ni$0wkR&M= $OnWLi=3DtR8
zvhYe0>#n0kp8Z~u3yqU0ZIOclm4066_dMaIdKBLE<JDDhNy~HEHGO5v9U=3D0T+ODUv<= br> zC8SmEy1cFEe3@FkZM6EI->90VG(Y=3DlGOCeT&0tuLp+z$p*Er0}$>V{I!^8|YFt= Txx
z@X*l4>D0{rUt=3D35bE5|t9J_s{&GuboZD*<I?tAKLvSqui3i{=3DBxx4RJ6csV&= lt;pY3qx
zc%EYYDlHYkjRXh2@TuH1=3DaM+;Gqyvv;&mx;T8-!69@lIn-t3a5*&*G8KFpvI38ND= Z
zXRR0;(@$zf^Mz-&9d5I9*G<Ew+mP5OSua;jH^}=3DFDaIRU+8^bv*+6`M(70oU!0U_= =3D
z&}eCgAm&LiE1Ztuo)19+f+2(Qu2!m#_4vb;|G-CZh`7Kh;Epf$lpotHpVZa9RxPJ`<= br> z*!LJ1N@A_5D-K2!c50h=3Dc|}nH2&&HIi=3DqJdndb5#r=3D*{jFDfG+GVizjpnnJP= CErUm
z(~py#01%ck2asI=3D5SB3ogcab#=3D?eJBr4{72%hYuq1akuw_HYtNmI2frgB`4tK#Ur;
zF6x6XH@P+_Ld&Pj=3DKhmtif_<#%m^#v8|1@fDley8DqbpRJ8##35S;)CLQU5(s-g1&= amp;
zGH1b11D=3D)VWpyG#bwi0++xi+Rry%Bx8xX28AhXsD5b>@|GH+hjmxkvqUg|FRiNY&&= amp;
zP6(%e4anlh3W@7JWKOd9E)p`E*!Jfr70=3D|kILq??taYC%vd4tW9O052<zlNPu3_&s=
zQXTCpCkvg$z92;~u`^Cz|8;Ub$4U{W)axrh#`o>FwtHy(W1l^5)-!Q6rkeY2pcOb5
zC5~Q^#`Cf!S&N9GM~?nelacMDHe;2ejYcV-D%(M~7r|5FJ&7hOIQ$Oo!_o8>9i&= gt;^x
zV>X-Uc!7Jfq5+jI?0J=3Dnn!sj`v1wMcU}PH?O>D8bJ*^GcSU|ak{Lxs!g8aAiFN}Pz<= br> A0RR91

literal 0
HcmV?d00001

--
2.32.0

--
Gcc-rust mailing list
Gcc-rust@gcc.gnu.= org
https://gcc.gnu.org/mailman/listinfo/gcc-rust
--000000000000cadc3505cd3321e0--