From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zack Weinberg" To: Eli Zaretskii Cc: dj@redhat.com, gcc@gcc.gnu.org, gdb@sources.redhat.com, binutils@sources.redhat.com, cygwin@sources.redhat.com Subject: Re: Another RFC: regex in libiberty Date: Fri, 08 Jun 2001 09:59:00 -0000 Message-id: <20010608095932.S979@stanford.edu> References: <9003-Fri08Jun2001100651+0300-eliz@is.elta.co.il> X-SW-Source: 2001-06/msg00400.html On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote: > > One notorious problem with GNU regex is that it is quite slow for many > simple jobs, such as matching a simple regular expression with no > backtracking. It seems that the main reason for this slowness is the > fact that GNU regex supports null characters in strings. For > examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower > on simple jobs than the same Sed compiled with Spencer's regex > library. I think the null characters are a red herring. I looked into GNU regex's performance in the context of GCC's fixincludes program, last year. On a platform that has mostly-okay headers, fixincludes spends most of its time matching regular expressions. The regex.c that came with GDB 4.18, which I think is the one that got spread around widely, had a bug in its implementation of the POSIX regcomp/regexec interface, which caused a major performance hit. That bug has been fixed in GNU libc for a long time. When I replaced fixincludes' copy of regex.c with a more recent version from glibc, fixincludes was sped up by a factor of nine. That same bug affects Sed 3.02 - replace the regex.c it ships with with the one from glibc 2.2.x and I bet you'll see better performance. There's some discussion in these messages: http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html The relevant fix is in there, too, if you want to pull it out and apply it. I did some benchmarking of fixincludes with Spencer's regexp library as well. IIRC, it was about the same as the fixed GNU regex.c. -- zw This is, no doubt, the rational strategy; quite possibly the only one that will work. But it ignores the exigiencies of the tenure system and is therefore impractical. -- Jerry Fodor, _The Mind Doesn't Work That Way_