public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/611] New: regex with a long character sequence requires huge stack space
@ 2004-12-17  8:34 bonzini at gnu dot org
  2004-12-17  8:38 ` [Bug regex/611] " bonzini at gnu dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-17  8:34 UTC (permalink / raw)
  To: glibc-bugs-regex

The parse tree produced by concat nodes is extremely unbalanced.  With an 8 MB 
stack,

perl -e "print 's/' . ('a' x 300000) . '//'" > huge
sed -f huge

segfaults in calc_first.  Of course with a lower stack size it takes much 
shorter regexes to trigger the bug, and anyway this is not nice behavior (ok, 
regex is a memory hog but at least it usually gives REG_ESPACE).

With a 512 KB stack, even compiling a{32767} segfaults.

-- 
           Summary: regex with a long character sequence requires huge stack
                    space
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: bonzini at gnu dot org
        ReportedBy: bonzini at gnu dot org
                CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
                    bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
@ 2004-12-17  8:38 ` bonzini at gnu dot org
  2004-12-20 10:22 ` bonzini at gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-17  8:38 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-17 08:37 -------
Created an attachment (id=312)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=312&action=view)
patch fixing this bug

This patch is a big one that completely separates parse tree creation and NFA
construction.  It was posted on
http://sources.redhat.com/ml/libc-alpha/2004-12/msg00045.html first; this
version is identical except that it includes the bug 605 changes on which the
libc-alpha patch is based.

Improvements in this patch include:

1) It eliminates recursion on CONCAT nodes from regcomp.c; with an 8 MB 
stack,

perl -e "print 's/' . ('a' x 300000) . '//'" | sed -f -

segfaults in analyze_tree without the patch.  Of course with a lower 
stack size it takes much shorter regexes to trigger the bug, and anyway 
this is not nice behavior (ok, regex is a memory hog but at least it 
usually gives REG_ESPACE). I don't have the time to fix this in a 
different way -- as I said, I am not doing this just for fun.

2) It provides a useful infrastructure while actually removing code from 
regcomp.c (the patched file is ~60 lines shorter).  This infrastructure 
can help in turn removing code from regexec.c and making it faster (see 
for example the star-normal form mentioned in libc-alpha).


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
  2004-12-17  8:38 ` [Bug regex/611] " bonzini at gnu dot org
@ 2004-12-20 10:22 ` bonzini at gnu dot org
  2004-12-27  9:57 ` bonzini at gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-20 10:22 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-20 10:22 -------
I'm putting pending patches at P1, just as a convention.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Priority|P2                          |P1


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
  2004-12-17  8:38 ` [Bug regex/611] " bonzini at gnu dot org
  2004-12-20 10:22 ` bonzini at gnu dot org
@ 2004-12-27  9:57 ` bonzini at gnu dot org
  2004-12-27 15:11 ` bonzini at gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-27  9:57 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-27 09:57 -------
This patch, introducing generic visitors, may also help in the elimination of 
CONCAT nodes, which are a memory hog.  They could be replaced (together with 
the calc_first pass) with a linked list of the binary trees that are 
concatenated.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
                   ` (2 preceding siblings ...)
  2004-12-27  9:57 ` bonzini at gnu dot org
@ 2004-12-27 15:11 ` bonzini at gnu dot org
  2004-12-27 16:43 ` drepper at redhat dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-27 15:11 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-27 15:11 -------
Created an attachment (id=324)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=324&action=view)
version of the patch not including the changes in BZ605 (same as posted to
libc-alpha)


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
                   ` (3 preceding siblings ...)
  2004-12-27 15:11 ` bonzini at gnu dot org
@ 2004-12-27 16:43 ` drepper at redhat dot com
  2004-12-28 16:34 ` bonzini at gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: drepper at redhat dot com @ 2004-12-27 16:43 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2004-12-27 16:43 -------
I'm waiting to be able to apply the patch in bug 605.

Also, always use unified diffs, not this horrible, unreadable context diff format.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
                   ` (4 preceding siblings ...)
  2004-12-27 16:43 ` drepper at redhat dot com
@ 2004-12-28 16:34 ` bonzini at gnu dot org
  2005-01-26 22:28 ` drepper at redhat dot com
  2005-01-26 22:43 ` drepper at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: bonzini at gnu dot org @ 2004-12-28 16:34 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-28 16:34 -------
Ok.  I usually prefer context diffs for very large patches, but I'll remember this.

I've updated BZ 605.

Paolo

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |605


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
                   ` (5 preceding siblings ...)
  2004-12-28 16:34 ` bonzini at gnu dot org
@ 2005-01-26 22:28 ` drepper at redhat dot com
  2005-01-26 22:43 ` drepper at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: drepper at redhat dot com @ 2005-01-26 22:28 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
Bug 611 depends on bug 605, which changed state.

Bug 605 Summary: regex goes uselessly through slow paths
http://sources.redhat.com/bugzilla/show_bug.cgi?id=605

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug regex/611] regex with a long character sequence requires huge stack space
  2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
                   ` (6 preceding siblings ...)
  2005-01-26 22:28 ` drepper at redhat dot com
@ 2005-01-26 22:43 ` drepper at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: drepper at redhat dot com @ 2005-01-26 22:43 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2005-01-26 22:43 -------
Applied to cvs after adjusted to current sources.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=611

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-01-26 22:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17  8:34 [Bug regex/611] New: regex with a long character sequence requires huge stack space bonzini at gnu dot org
2004-12-17  8:38 ` [Bug regex/611] " bonzini at gnu dot org
2004-12-20 10:22 ` bonzini at gnu dot org
2004-12-27  9:57 ` bonzini at gnu dot org
2004-12-27 15:11 ` bonzini at gnu dot org
2004-12-27 16:43 ` drepper at redhat dot com
2004-12-28 16:34 ` bonzini at gnu dot org
2005-01-26 22:28 ` drepper at redhat dot com
2005-01-26 22:43 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).