public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: "serhei at serhei dot io" <sourceware-bugzilla@sourceware.org>
To: systemtap@sourceware.org
Subject: [Bug translator/30395] Regex code has invalid memory reads caught by KASAN
Date: Fri, 05 May 2023 16:12:34 +0000	[thread overview]
Message-ID: <bug-30395-6586-FC0ZgTDAnJ@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-30395-6586@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=30395

--- Comment #6 from Serhei Makarov <serhei at serhei dot io> ---
There's a simple fix that I think will work, but I'll need to add a bit of code
to doublecheck/guard against the state 'to' doing anything except exiting on a
NUL. This shouldn't happen -- essentially, the below tweak feeds the DFA an
unending sequence of NULs, which it should terminate on soon-enough. (The extra
state transition is needed because of the rather fiddly TNFA bookkeeping I
added in 2017 to handle capture groups.)

Note that uncommenting STAPREGEX_DEBUG_DFA in stapregex-dfa.cxx will produce a
trace of visited states. I can clearly see how far it goes beyond the NUL when
matching against a statically allocated string :(

diff --git a/stapregex-dfa.cxx b/stapregex-dfa.cxx
index 3601b28dd..cae8e2494 100644
--- a/stapregex-dfa.cxx
+++ b/stapregex-dfa.cxx
@@ -1020,7 +1020,7 @@ span::emit_jump (translator_output *o, const dfa *d)
const

   if (to->accepts)
     {
-      emit_final(o, d);
+      emit_final(o, d, false /*saw_nul*/);
       return;
     }

@@ -1033,7 +1033,7 @@ span::emit_jump (translator_output *o, const dfa *d)
const
 /* Assuming the target DFA state of the span is a final state, emit code to
    cleanup tags and (if appropriate) exit with a final answer. */
 void
-span::emit_final (translator_output *o, const dfa *d) const
+span::emit_final (translator_output *o, const dfa *d, bool saw_nul) const
 {
   assert (to->accepts); // XXX: must guarantee correct usage of emit_final()

@@ -1087,6 +1087,11 @@ span::emit_final (translator_output *o, const dfa *d)
const
       o->indent(-1);
       o->newline() << "}";

+      if (saw_nul)
+          {
+              o->newline () << "/* XXX PROBLEM TRANSITION XXX */"; /* DEBUG */
+              o->newline () << "YYCURSOR--;"; /* SUGGESTED FIX: the next state
will encounter a repeated NUL */
+          }
       o->newline () << "goto yystate" << to->label << ";";
     }
 }
@@ -1119,10 +1124,11 @@ state::emit (translator_output *o, const dfa *d) const
       if (it->lb == '\0')
         {
           o->newline() << "case " << c_char('\0') << ":";
-          it->emit_final(o, d);
+          it->emit_final(o, d, true /* saw_nul */);
         }

       // Emit labels to handle all the other elements of the span:
diff --git a/stapregex-dfa.h b/stapregex-dfa.h
index c9a398fd7..065e1fe41 100644
--- a/stapregex-dfa.h
+++ b/stapregex-dfa.h
@@ -103,7 +103,7 @@ struct span {
   state_kernel *reach_pairs; // -- starting point for te_closure computation

   void emit_jump (translator_output *o, const dfa *d) const;
-  void emit_final (translator_output *o, const dfa *d) const;
+  void emit_final (translator_output *o, const dfa *d, bool saw_nul) const;
 };

 struct state {

-- 
You are receiving this mail because:
You are the assignee for the bug.

  parent reply	other threads:[~2023-05-05 16:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27  5:09 [Bug translator/30395] New: " agentzh at gmail dot com
2023-04-27  5:10 ` [Bug translator/30395] " agentzh at gmail dot com
2023-04-28  2:20 ` agentzh at gmail dot com
2023-05-03  0:42 ` agentzh at gmail dot com
2023-05-03  0:46 ` agentzh at gmail dot com
2023-05-03 13:55 ` serhei at serhei dot io
2023-05-05 16:12 ` serhei at serhei dot io [this message]
2023-05-08 12:17 ` serhei at serhei dot io
2023-05-09 19:55 ` agentzh at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30395-6586-FC0ZgTDAnJ@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).