public inbox for libstdc++-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r11-10118] libstdc++: std::basic_regex should treat '\0' as an ordinary char [PR84110]
@ 2022-07-07 23:32 Jonathan Wakely
  0 siblings, 0 replies; only message in thread
From: Jonathan Wakely @ 2022-07-07 23:32 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:5df21c00aedb7878b8854901e95d7eda70266d31

commit r11-10118-g5df21c00aedb7878b8854901e95d7eda70266d31
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Sep 29 13:48:11 2021 +0100

    libstdc++: std::basic_regex should treat '\0' as an ordinary char [PR84110]
    
    When the input sequence contains a _CharT(0) character, the strchr call
    in _Scanner<_CharT>::_M_scan_normal() will search for '\0' and so return
    a pointer to the terminating null at the end of the string. This makes
    the scanner think it's found a special character. Because it doesn't
    match any of the actual special characters, we fall off the end of the
    function (or assert in debug mode).
    
    We should check for a null character explicitly and either treat it as
    an ordinary character (for the ECMAScript grammar) or an error (for all
    others). I'm not 100% sure that's right, but it seems consistent with
    the POSIX RE rules where a '\0' means the end of the regex pattern or
    the end of the sequence being matched.
    
    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/84110
            * include/bits/regex_error.h (regex_constants::_S_null): New
            error code for internal use.
            * include/bits/regex_scanner.tcc (_Scanner::_M_scan_normal()):
            Check for null character.
            * testsuite/28_regex/basic_regex/84110.cc: New test.
    
    (cherry picked from commit b701e1f8f6870c0f8cb4050674da489101dd05a5)

Diff:
---
 libstdc++-v3/include/bits/regex_error.h            |  1 +
 libstdc++-v3/include/bits/regex_scanner.tcc        | 10 ++++++
 .../testsuite/28_regex/basic_regex/84110.cc        | 39 ++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h
index 27593833544..9212fd31552 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -61,6 +61,7 @@ namespace regex_constants
       _S_error_badrepeat,
       _S_error_complexity,
       _S_error_stack,
+      _S_null
     };
 
   /** The expression contained an invalid collating element name. */
diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc b/libstdc++-v3/include/bits/regex_scanner.tcc
index a3512083f0e..04f78f0baee 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -176,6 +176,16 @@ namespace __detail
 	  _M_state = _S_state_in_brace;
 	  _M_token = _S_token_interval_begin;
 	}
+      else if (__builtin_expect(__c == _CharT(0), false))
+	{
+	  if (!_M_is_ecma())
+	    {
+	      __throw_regex_error(regex_constants::_S_null,
+		  "Unexpected null character in regular expression");
+	    }
+	  _M_token = _S_token_ord_char;
+	  _M_value.assign(1, __c);
+	}
       else if (__c != ']' && __c != '}')
 	{
 	  auto __it = _M_token_tbl;
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
new file mode 100644
index 00000000000..b9971dcaac5
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
@@ -0,0 +1,39 @@
+// { dg-do run { target c++11 } }
+#include <regex>
+#include <string>
+#include <testsuite_hooks.h>
+
+void test01()
+{
+  const std::string s(1ul, '\0');
+  std::regex re(s);
+  VERIFY( std::regex_match(s, re) ); // PR libstdc++/84110
+
+#if __cpp_exceptions
+  using namespace std::regex_constants;
+  for (auto syn : {basic, extended, awk, grep, egrep})
+  {
+    try
+    {
+      std::regex{s, syn}; // '\0' is not valid for other grammars
+      VERIFY( false );
+    }
+    catch (const std::regex_error&)
+    {
+    }
+  }
+#endif
+}
+
+void test02()
+{
+  const std::string s("uh-\0h", 5);
+  std::regex re(s);
+  VERIFY( std::regex_match(s, re) );
+}
+
+int main()
+{
+  test01();
+  test02();
+}


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-07-07 23:32 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 23:32 [gcc r11-10118] libstdc++: std::basic_regex should treat '\0' as an ordinary char [PR84110] Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).