public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/558] New: regcomp and regexec bug
@ 2004-11-18  2:06 vprodan dot hosting at artstyle dot ru
  2004-11-22  9:15 ` [Bug regex/558] " bonzini at gnu dot org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: vprodan dot hosting at artstyle dot ru @ 2004-11-18  2:06 UTC (permalink / raw)
  To: glibc-bugs-regex

Some regular expressions make regcomp to eat CPU and memory.
Try run date |egrep '.{1,2048}' and enjoy the result if you're patient enough.

Test case:
#include <stdio.h>
#include <regex.h>

int main(void) {
    char s[] = "/.{1,2048}.{1,2048}.{1,2048}.{1,2048}.{1,2048}/";
    regex_t reg;                      
    int res;    
    fprintf(stderr, "starting\n");
    res = regcomp(&reg, s, REG_EXTENDED);
    fprintf(stderr, "regcomp(&reg, '%s', REG_EXTENDED) = %d\n", s, res);
}

This bug appeared in between 2.2.5 and 2.3:

> date: 2002/02/27 19:00:56;  author: drepper;  state: Exp;  lines: +7 -8399
> Check in complete rewrite.

-- 
           Summary: regcomp and regexec bug
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: gotom at debian dot or dot jp
        ReportedBy: vprodan dot hosting at artstyle dot ru
                CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
                    bugs at sources dot redhat dot com,vprodan dot hosting
                    at artstyle dot ru
 GCC build triplet: i686-pc-linux-gnu


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/558] regcomp and regexec bug
  2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
@ 2004-11-22  9:15 ` bonzini at gnu dot org
  2004-11-23 12:31 ` bonzini at gnu dot org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bonzini at gnu dot org @ 2004-11-22  9:15 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-11-22 09:15 -------
It can be fixed by lowering for example .{1,5} to (.(.(.(..?)?)?)?)? instead 
of .?.?.?.?.? --- of course the brackets are not capturing, they're only there 
to show what is the question mark applied to.

This has the property of making all the period nodes epsilon-transit to the end 
of the braced expression, rather than to the next period.  The epsilon closure 
thus is much smaller.

Another improvement is to use a special version of re_node_set_insert in 
calc_inveclosure, since we know that the node is being added at the end of the 
set.  I have a patch to do that and it obtains a 30% improvement on this test 
case.  It does not cure the complexity problem though, so I am not submitting 
it yet.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|gotom at debian dot or dot  |bonzini at gnu dot org
                   |jp                          |
             Status|NEW                         |ASSIGNED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/558] regcomp and regexec bug
  2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
  2004-11-22  9:15 ` [Bug regex/558] " bonzini at gnu dot org
@ 2004-11-23 12:31 ` bonzini at gnu dot org
  2004-11-23 12:32 ` bonzini at gnu dot org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bonzini at gnu dot org @ 2004-11-23 12:31 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |500
              nThis|                            |


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/558] regcomp and regexec bug
  2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
  2004-11-22  9:15 ` [Bug regex/558] " bonzini at gnu dot org
  2004-11-23 12:31 ` bonzini at gnu dot org
@ 2004-11-23 12:32 ` bonzini at gnu dot org
  2004-12-10 16:56 ` bonzini at gnu dot org
  2005-01-27 19:08 ` drepper at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: bonzini at gnu dot org @ 2004-11-23 12:32 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-11-23 12:32 -------
Created an attachment (id=289)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=289&action=view)
patch to speed up testcase by 50%

Reduce complexity of calc_eclosure in this case, by modifying the way .{1,2048}
is lowered.  calc_inveclosure is sped up but still expensive.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/558] regcomp and regexec bug
  2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
                   ` (2 preceding siblings ...)
  2004-11-23 12:32 ` bonzini at gnu dot org
@ 2004-12-10 16:56 ` bonzini at gnu dot org
  2005-01-27 19:08 ` drepper at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: bonzini at gnu dot org @ 2004-12-10 16:56 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-10 16:56 -------
Patch applied.  I'm going through some more changes in the regex matcher so I'm
delaying the inveclosure work for a while.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |SUSPENDED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/558] regcomp and regexec bug
  2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
                   ` (3 preceding siblings ...)
  2004-12-10 16:56 ` bonzini at gnu dot org
@ 2005-01-27 19:08 ` drepper at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: drepper at redhat dot com @ 2005-01-27 19:08 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2005-01-27 19:07 -------
Updated patch sent to libc-hacker checked into cvs.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|SUSPENDED                   |RESOLVED
         Resolution|                            |FIXED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=558

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-27 19:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-18  2:06 [Bug regex/558] New: regcomp and regexec bug vprodan dot hosting at artstyle dot ru
2004-11-22  9:15 ` [Bug regex/558] " bonzini at gnu dot org
2004-11-23 12:31 ` bonzini at gnu dot org
2004-11-23 12:32 ` bonzini at gnu dot org
2004-12-10 16:56 ` bonzini at gnu dot org
2005-01-27 19:08 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).