public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/501] New: transit_state is slow
@ 2004-11-04 8:51 bonzini at gnu dot org
2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-04 8:51 UTC (permalink / raw)
To: glibc-bugs-regex
transit_state does way a lot scaffolding before actually transiting to a new
state. We have to look at state->word_trtable to check if we are dealing with a
256-entry or 512-entry transition table. The patch at
http://sources.redhat.com/ml/libc-alpha/2004-10/msg00164.html
helps by storing the pointers to the two kinds of transition table in two
different fields of the structure. We can penalize the 512-entry transition
table, because it calls iswalnum and this is very slow anyway.
--
Summary: transit_state is slow
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: regex
AssignedTo: bonzini at gnu dot org
ReportedBy: bonzini at gnu dot org
CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
bugs at sources dot redhat dot com
OtherBugsDependingO 500
nThis:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
@ 2004-11-10 10:55 ` bonzini at gnu dot org
2004-11-10 14:05 ` drepper at redhat dot com
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-10 10:55 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-11-10 10:54 -------
Created an attachment (id=273)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=273&action=view)
updated patch
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
@ 2004-11-10 14:05 ` drepper at redhat dot com
2004-11-10 14:19 ` bonzini at gnu dot org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2004-11-10 14:05 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2004-11-10 14:04 -------
I fail to see what your patch is supposed to improve. From what I can see it
replaces the bit word_trtable with a pointer. Now ->trtable == NULL means the
same as word_trtable == 1 before. The code in transit_state is slightly
reordered but in no way that should make a difference.
Please explain.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
2004-11-10 14:05 ` drepper at redhat dot com
@ 2004-11-10 14:19 ` bonzini at gnu dot org
2004-11-11 9:20 ` bonzini at gnu dot org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-10 14:19 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-11-10 14:18 -------
The best path with my patch is
trtable = state->trtable;
if (BE (trtable != NULL, 1))
return trtable[ch];
versus
trtable = state->trtable;
if (trtable == NULL)
...
if (BE (state->word_trtable, 0))
...
else
return trtable[ch];
where state->word_trtable was a bitfield. Removing the access to this bitfield
does make a difference of about 3-4% on some regexps because this is a very busy
path. A regexp which benefits from this is simply .*; an NFA matcher can do
this in O(1) time, while we are bound to O(N) anyway, so it is important to make
it as fast as possible.
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (2 preceding siblings ...)
2004-11-10 14:19 ` bonzini at gnu dot org
@ 2004-11-11 9:20 ` bonzini at gnu dot org
2004-11-13 12:59 ` bonzini at gnu dot org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-11 9:20 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-11-11 09:20 -------
Here is a testcase. Timings taken on a PPC G4 at 1.5 GHz
time perl -e 'print "A" x 10000' | ./sed -n '/[A-Z].*[0-9]$/p'
without patch: 1.98s
with patch: 1.89s
time perl -e 'print "A" x 30000' | ./sed -n '/[A-Z].*[0-9]$/p'
without patch: 17.59s
with patch: 17.11s
In this case, regex complexity is quadratic. This is a savings of 2.7%; I
remember seeing slightly higher figures on x86.
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (3 preceding siblings ...)
2004-11-11 9:20 ` bonzini at gnu dot org
@ 2004-11-13 12:59 ` bonzini at gnu dot org
2004-11-13 13:00 ` bonzini at gnu dot org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-13 12:59 UTC (permalink / raw)
To: glibc-bugs-regex
--
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |SUSPENDED
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (4 preceding siblings ...)
2004-11-13 12:59 ` bonzini at gnu dot org
@ 2004-11-13 13:00 ` bonzini at gnu dot org
2004-12-01 17:01 ` bonzini at gnu dot org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-11-13 13:00 UTC (permalink / raw)
To: glibc-bugs-regex
--
What |Removed |Added
----------------------------------------------------------------------------
Status|SUSPENDED |WAITING
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (5 preceding siblings ...)
2004-11-13 13:00 ` bonzini at gnu dot org
@ 2004-12-01 17:01 ` bonzini at gnu dot org
2004-12-10 16:50 ` bonzini at gnu dot org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-01 17:01 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-12-01 17:01 -------
I can reproduce a saving of 3% on non-quadratic cases too. This testcase just
chews up 100 megabytes of data
perl -e 'for ($i = 1; $i < 10000; $i++)
{print ("A" x 10000); print "\n"}' | /usr/bin/time ./sed -ne 's/.*//'
without patch: 9.31 user
with patch: 9.05 user
By comparison, egrep (another DFA) does it in 0.8 seconds.
Reopening.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (6 preceding siblings ...)
2004-12-01 17:01 ` bonzini at gnu dot org
@ 2004-12-10 16:50 ` bonzini at gnu dot org
2004-12-20 10:21 ` bonzini at gnu dot org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-10 16:50 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-12-10 16:50 -------
Created an attachment (id=306)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=306&action=view)
updated patch
The original patch does not apply cleanly anymore.
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #273 is|0 |1
obsolete| |
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (7 preceding siblings ...)
2004-12-10 16:50 ` bonzini at gnu dot org
@ 2004-12-20 10:21 ` bonzini at gnu dot org
2004-12-20 10:22 ` bonzini at gnu dot org
2004-12-27 16:40 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-20 10:21 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2004-12-20 10:21 -------
I'm putting pending patches at P1, just as a convention.
--
What |Removed |Added
----------------------------------------------------------------------------
Priority|P2 |P1
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (8 preceding siblings ...)
2004-12-20 10:21 ` bonzini at gnu dot org
@ 2004-12-20 10:22 ` bonzini at gnu dot org
2004-12-27 16:40 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: bonzini at gnu dot org @ 2004-12-20 10:22 UTC (permalink / raw)
To: glibc-bugs-regex
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug regex/501] transit_state is slow
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
` (9 preceding siblings ...)
2004-12-20 10:22 ` bonzini at gnu dot org
@ 2004-12-27 16:40 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2004-12-27 16:40 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2004-12-27 16:40 -------
I've applied the patch.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://sources.redhat.com/bugzilla/show_bug.cgi?id=501
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-12-27 16:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-04 8:51 [Bug regex/501] New: transit_state is slow bonzini at gnu dot org
2004-11-10 10:55 ` [Bug regex/501] " bonzini at gnu dot org
2004-11-10 14:05 ` drepper at redhat dot com
2004-11-10 14:19 ` bonzini at gnu dot org
2004-11-11 9:20 ` bonzini at gnu dot org
2004-11-13 12:59 ` bonzini at gnu dot org
2004-11-13 13:00 ` bonzini at gnu dot org
2004-12-01 17:01 ` bonzini at gnu dot org
2004-12-10 16:50 ` bonzini at gnu dot org
2004-12-20 10:21 ` bonzini at gnu dot org
2004-12-20 10:22 ` bonzini at gnu dot org
2004-12-27 16:40 ` drepper at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).