public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/501] New: transit_state is slow
@ 2004-11-04  8:51 bonzini at gnu dot org
  2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-04  8:51 UTC (permalink / raw)
  To: glibc-bugs-regex

transit_state does way a lot scaffolding before actually transiting to a new
state.  We have to look at state->word_trtable to check if we are dealing with a
256-entry or 512-entry transition table. The patch at

http://sources.redhat.com/ml/libc-alpha/2004-10/msg00164.html

helps by storing the pointers to the two kinds of transition table in two
different fields of the structure.  We can penalize the 512-entry transition
table, because it calls iswalnum and this is very slow anyway.

-- 
           Summary: transit_state is slow
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: bonzini at gnu dot org
        ReportedBy: bonzini at gnu dot org
                CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
                    bugs at sources dot redhat dot com
OtherBugsDependingO 500
             nThis:


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
@ 2004-11-10 10:55 ` bonzini at gnu dot org
  2004-11-10 14:05 ` drepper at redhat dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-10 10:55 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-11-10 10:54 -------
Created an attachment (id=273)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=273&action=view)
updated patch


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
  2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
@ 2004-11-10 14:05 ` drepper at redhat dot com
  2004-11-10 14:19 ` bonzini at gnu dot org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2004-11-10 14:05 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2004-11-10 14:04 -------
I fail to see what your patch is supposed to improve.  From what I can see it
replaces the bit word_trtable with a pointer.  Now ->trtable == NULL means the
same as word_trtable == 1 before.  The code in transit_state is slightly
reordered but in no way that should make a difference.

Please explain.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
  2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
  2004-11-10 14:05 ` drepper at redhat dot com
@ 2004-11-10 14:19 ` bonzini at gnu dot org
  2004-11-11  9:20 ` bonzini at gnu dot org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-10 14:19 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-11-10 14:18 -------
The best path with my patch is

       trtable = state->trtable;
       if (BE (trtable != NULL, 1))
 	return trtable[ch];

versus
 
       trtable = state->trtable;
       if (trtable == NULL)
         ...
       if (BE (state->word_trtable, 0))
         ...
       else
 	 return trtable[ch];

where state->word_trtable was a bitfield.  Removing the access to this bitfield
does make a difference of about 3-4% on some regexps because this is a very busy
path.  A regexp which benefits from this is simply .*; an NFA matcher can do
this in O(1) time, while we are bound to O(N) anyway, so it is important to make
it as fast as possible.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (2 preceding siblings ...)
  2004-11-10 14:19 ` bonzini at gnu dot org
@ 2004-11-11  9:20 ` bonzini at gnu dot org
  2004-11-13 12:59 ` bonzini at gnu dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-11  9:20 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-11-11 09:20 -------
Here is a testcase.  Timings taken on a PPC G4 at 1.5 GHz

time perl -e 'print "A" x 10000' | ./sed -n '/[A-Z].*[0-9]$/p'
  without patch: 1.98s
  with patch: 1.89s

time perl -e 'print "A" x 30000' | ./sed -n '/[A-Z].*[0-9]$/p'
  without patch: 17.59s
  with patch: 17.11s

In this case, regex complexity is quadratic.  This is a savings of 2.7%; I
remember seeing slightly higher figures on x86.

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (3 preceding siblings ...)
  2004-11-11  9:20 ` bonzini at gnu dot org
@ 2004-11-13 12:59 ` bonzini at gnu dot org
  2004-11-13 13:00 ` bonzini at gnu dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-13 12:59 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |SUSPENDED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (4 preceding siblings ...)
  2004-11-13 12:59 ` bonzini at gnu dot org
@ 2004-11-13 13:00 ` bonzini at gnu dot org
  2004-12-01 17:01 ` bonzini at gnu dot org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-13 13:00 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|SUSPENDED                   |WAITING


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (5 preceding siblings ...)
  2004-11-13 13:00 ` bonzini at gnu dot org
@ 2004-12-01 17:01 ` bonzini at gnu dot org
  2004-12-10 16:50 ` bonzini at gnu dot org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-01 17:01 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-01 17:01 -------
I can reproduce a saving of 3% on non-quadratic cases too.  This testcase just
chews up 100 megabytes of data

perl -e 'for ($i = 1; $i < 10000; $i++)
  {print ("A" x 10000); print "\n"}' | /usr/bin/time ./sed -ne 's/.*//'

without patch:       9.31 user
with patch:          9.05 user

By comparison, egrep (another DFA) does it in 0.8 seconds.

Reopening.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (6 preceding siblings ...)
  2004-12-01 17:01 ` bonzini at gnu dot org
@ 2004-12-10 16:50 ` bonzini at gnu dot org
  2004-12-20 10:21 ` bonzini at gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-10 16:50 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-10 16:50 -------
Created an attachment (id=306)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=306&action=view)
updated patch

The original patch does not apply cleanly anymore.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #273 is|0                           |1
           obsolete|                            |


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (7 preceding siblings ...)
  2004-12-10 16:50 ` bonzini at gnu dot org
@ 2004-12-20 10:21 ` bonzini at gnu dot org
  2004-12-20 10:22 ` bonzini at gnu dot org
  2004-12-27 16:40 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-20 10:21 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2004-12-20 10:21 -------
I'm putting pending patches at P1, just as a convention.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P1


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (8 preceding siblings ...)
  2004-12-20 10:21 ` bonzini at gnu dot org
@ 2004-12-20 10:22 ` bonzini at gnu dot org
  2004-12-27 16:40 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-20 10:22 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug regex/501] transit_state is slow
  2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
                   ` (9 preceding siblings ...)
  2004-12-20 10:22 ` bonzini at gnu dot org
@ 2004-12-27 16:40 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2004-12-27 16:40 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2004-12-27 16:40 -------
I've applied the patch.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://sources.redhat.com/bugzilla/show_bug.cgi?id=501

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-12-27 16:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-04  8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
2004-11-10 14:05 ` drepper at redhat dot com
2004-11-10 14:19 ` bonzini at gnu dot org
2004-11-11  9:20 ` bonzini at gnu dot org
2004-11-13 12:59 ` bonzini at gnu dot org
2004-11-13 13:00 ` bonzini at gnu dot org
2004-12-01 17:01 ` bonzini at gnu dot org
2004-12-10 16:50 ` bonzini at gnu dot org
2004-12-20 10:21 ` bonzini at gnu dot org
2004-12-20 10:22 ` bonzini at gnu dot org
2004-12-27 16:40 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).