public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/15065] New: regular expressions: subexpression capture support
@ 2013-01-25 19:52 smakarov at redhat dot com
  2013-05-25 23:15 ` [Bug runtime/15065] " smakarov at redhat dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: smakarov at redhat dot com @ 2013-01-25 19:52 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=15065

             Bug #: 15065
           Summary: regular expressions: subexpression capture support
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap@sourceware.org
        ReportedBy: smakarov@redhat.com
    Classification: Unclassified


After running a regular expression, the coordinates of matched ()
subexpressions should be saved to the probe-local data. The resulting matched
strings should be extracted using a tapset function written in embedded C e.g.
matched(0), matched(1), &c.

A couple of papers dealing with how to support subexpressions in a DFA are
here:

http://yrx.googlecode.com/files/yrxreg.pdf (yrx)
http://laurikari.net/ville/spire2000-tnfa.pdf

(Each one describes an existing regular expression engine, so there's also
working code to look at in each case.)

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
@ 2013-05-25 23:15 ` smakarov at redhat dot com
  2017-08-03 16:31 ` serhei.public at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: smakarov at redhat dot com @ 2013-05-25 23:15 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=15065

Serguei Makarov <smakarov at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fche at redhat dot com

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
  2013-05-25 23:15 ` [Bug runtime/15065] " smakarov at redhat dot com
@ 2017-08-03 16:31 ` serhei.public at gmail dot com
  2017-09-08 13:44 ` serhei.public at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: serhei.public at gmail dot com @ 2017-08-03 16:31 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

Serhei Makarov <serhei.public at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |serhei.public at gmail dot com

--- Comment #1 from Serhei Makarov <serhei.public at gmail dot com> ---
Just a heads up that I'm working on this feature, with my current code at
https://github.com/serhei/stap-experiments/commits/serhei/tnfa. I wrote a
solution based on Laurikari's TNFA algorithm.

The testsuite
(https://github.com/serhei/stap-experiments/blob/serhei/tnfa/testsuite/runok/regex_grouping.stp)
needs to be a lot more complete before I'm willing to put confidence in this
feature. Current results look like this:

regex PASS: #1: aaa =~ a* with 1 groups 'aaa' 
regex FAIL (grouping): #2: abab =~ (ab)* with 2 groups '' '' 
regex PASS: #3: cabab =~ c(ab)* with 2 groups 'cabab' 'ab' 
regex PASS: #4: aaa =~ (a*)a*a with 2 groups 'aaa' 'aa' 
regex PASS: #5: regex =~ re(gex) with 2 groups 'regex' 'gex' 
regex PASS: #6: longer =~ (long|longer) with 2 groups 'longer' 'longer' 
regex PASS: #7: unrelated !~ regex
regex PASS: #8: \ =~ \\ with 1 groups '\' 
regex PASS: #9: xabcy =~ abc with 1 groups 'abc' 
regex PASS: #10: abbbbc =~ ab*bc with 1 groups 'abbbbc' 
regex PASS: #11: abbc =~ ab?bc with 1 groups 'abbc' 
regex PASS: #12: abcc !~ ^abc$
regex PASS: #13: abd !~ a[b-d]e
regex PASS: #14: ace =~ a[b-d]e with 1 groups 'ace' 
regex PASS: #15: ab =~ a\(*b with 1 groups 'ab' 
regex PASS: #16: a((b =~ a\(*b with 1 groups 'a((b' 
regex PASS: #17: ab =~ (a+|b)* with 2 groups 'ab' 'b' 
regex PASS: #18: ab =~ (a+|b)+ with 2 groups 'ab' 'b' 
regex PASS: #19: abbbcd =~ ([abc])*d with 2 groups 'abbbcd' 'c' 
regex PASS: #20: abcde !~ ^(ab|cd)e
regex PASS: #21: abcde =~ (ab|cd)e with 2 groups 'cde' 'cd' 
regex PASS: #22: abcde =~ (ab|cd)e$ with 2 groups 'cde' 'cd' 
regex PASS: #23: alpha =~ [A-Za-z_][A-Za-z0-9_]* with 1 groups 'alpha' 
regex PASS: #24: ij =~ (bc+d$|ef*g.|h?i(j|k)) with 3 groups 'ij' 'ij' 'j' 
regex PASS: #25: effg !~ (bc+d$|ef*g.|h?i(j|k))
regex PASS: #26: 00effg12 =~ (bc+d$|ef*g.|h?i(j|k)) with 3 groups 'effg1'
'effg1' '' 
regex PASS: #27: bcccd =~ (bc+d$|ef*g.|h?i(j|k)) with 3 groups 'bcccd' 'bcccd'
'' 
regex PASS: #28: a =~ (((((((((a))))))))) with 10 groups 'a' 'a' 'a' 'a' 'a'
'a' 'a' 'a' 'a' 'a' 
regex PASS: #29: (.*)\) !~ \((.*),
regex PASS: #30: ab !~ [k]
regex PASS: #31: abcd =~ abcd with 1 groups 'abcd' 
regex PASS: #32: abcd =~ a(bc)d with 2 groups 'abcd' 'bc' 

regex total PASS: 31, FAIL: 1

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
  2013-05-25 23:15 ` [Bug runtime/15065] " smakarov at redhat dot com
  2017-08-03 16:31 ` serhei.public at gmail dot com
@ 2017-09-08 13:44 ` serhei.public at gmail dot com
  2017-09-08 14:09 ` fche at redhat dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: serhei.public at gmail dot com @ 2017-09-08 13:44 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

--- Comment #2 from Serhei Makarov <serhei.public at gmail dot com> ---
I've formatted the regex functionality at
https://github.com/serhei/stap-experiments/commits/serhei/tnfa-submit and I
think it's ready to be considered for the master branch.

For anyone doing code review, commit (3/8) touches the translator while commit
(7/8) contains the parts generating kernel-facing matcher code. The rest of the
functionality is self-contained in tapset/regex.stp and the stapregex*.{h,cxx}
files.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (2 preceding siblings ...)
  2017-09-08 13:44 ` serhei.public at gmail dot com
@ 2017-09-08 14:09 ` fche at redhat dot com
  2017-09-08 15:39 ` serhei.public at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: fche at redhat dot com @ 2017-09-08 14:09 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
Briefly scanned over the code, looks good ... almost computer sciencey!
Would you be interested in drafting NEWS / stap.1 blurbs for same?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (3 preceding siblings ...)
  2017-09-08 14:09 ` fche at redhat dot com
@ 2017-09-08 15:39 ` serhei.public at gmail dot com
  2017-09-09  0:01 ` jistone at redhat dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: serhei.public at gmail dot com @ 2017-09-08 15:39 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

--- Comment #4 from Serhei Makarov <serhei.public at gmail dot com> ---
Certainly, I can add more documentation this weekend.

There was also some talk of wrapping the matched() tapset functions in a
pseudovariable syntax ("\1" for "matched(1)"), but unfortunately my plate is a
bit too full at the moment to implement it cleanly. Someone else might be able
to get around to implementing it more quickly.

Another bit of sugar that might make sense is a regex literal syntax /regex/.
With the way the current implementation works, this would just be a string
literal, but with slightly different rules for how backslash is parsed, so you
could write /a\(*b/ instead of "a\\(*b".

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (4 preceding siblings ...)
  2017-09-08 15:39 ` serhei.public at gmail dot com
@ 2017-09-09  0:01 ` jistone at redhat dot com
  2017-09-12  2:27 ` serhei.public at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jistone at redhat dot com @ 2017-09-09  0:01 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

Josh Stone <jistone at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jistone at redhat dot com

--- Comment #5 from Josh Stone <jistone at redhat dot com> ---
Some languages have a raw string syntax, e.g. Python r"a\(*b", which is useful
for regex but also applicable anywhere strings are used.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (5 preceding siblings ...)
  2017-09-09  0:01 ` jistone at redhat dot com
@ 2017-09-12  2:27 ` serhei.public at gmail dot com
  2017-09-13 21:26 ` fche at redhat dot com
  2017-09-15 19:09 ` fche at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: serhei.public at gmail dot com @ 2017-09-12  2:27 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

--- Comment #6 from Serhei Makarov <serhei.public at gmail dot com> ---
I've pushed some draft documentation to the branch on Github.

The tapset regex.stp contains additional documentation for matched() and
ngroups(), though I don't remember how to get it included in the tapset
reference.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (6 preceding siblings ...)
  2017-09-12  2:27 ` serhei.public at gmail dot com
@ 2017-09-13 21:26 ` fche at redhat dot com
  2017-09-15 19:09 ` fche at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: fche at redhat dot com @ 2017-09-13 21:26 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

--- Comment #7 from Frank Ch. Eigler <fche at redhat dot com> ---

> The tapset regex.stp contains additional documentation for matched() and
> ngroups(), though I don't remember how to get it included in the tapset
> reference.

That part's the job of doc/SystemTap_Tapset_Reference/tapsets.tmpl, plus
configure --enable-docs .

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug runtime/15065] regular expressions: subexpression capture support
  2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
                   ` (7 preceding siblings ...)
  2017-09-13 21:26 ` fche at redhat dot com
@ 2017-09-15 19:09 ` fche at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: fche at redhat dot com @ 2017-09-15 19:09 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=15065

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Frank Ch. Eigler <fche at redhat dot com> ---
Merged from serhei: 252fef44ef6

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-15 19:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-25 19:52 [Bug runtime/15065] New: regular expressions: subexpression capture support smakarov at redhat dot com
2013-05-25 23:15 ` [Bug runtime/15065] " smakarov at redhat dot com
2017-08-03 16:31 ` serhei.public at gmail dot com
2017-09-08 13:44 ` serhei.public at gmail dot com
2017-09-08 14:09 ` fche at redhat dot com
2017-09-08 15:39 ` serhei.public at gmail dot com
2017-09-09  0:01 ` jistone at redhat dot com
2017-09-12  2:27 ` serhei.public at gmail dot com
2017-09-13 21:26 ` fche at redhat dot com
2017-09-15 19:09 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).