public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/11159] New: lock contention within regexec() when used from multiple threads
@ 2010-01-11  9:40 extproxy at gmail dot com
  2010-01-11  9:58 ` [Bug regex/11159] " schwab at linux-m68k dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: extproxy at gmail dot com @ 2010-01-11  9:40 UTC (permalink / raw)
  To: glibc-bugs-regex

I have a program that uses multiple threads. Each thread makes heavy use of 
regular expression matches by calling the glibc regexec() function. 
Unfortunately, this function seems to acquire a global lock - which causes poor 
performance in a multi-threaded environment. 

I'm not even sure what regexec() needs to lock - it really doesn't need access 
to any global state. Maybe it accesses some global locale object or something. 
Anyways, it doesn't need to acquire a write lock - a read lock should have 
sufficed. Alternatively, a thread-local data structure could be considered.

Hope future releases of glibc can address this performance bug.

-- 
           Summary: lock contention within regexec() when used from multiple
                    threads
           Product: glibc
           Version: 2.10
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: drepper at redhat dot com
        ReportedBy: extproxy at gmail dot com
                CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
                    bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
@ 2010-01-11  9:58 ` schwab at linux-m68k dot org
  2010-01-11 17:46 ` extproxy at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: schwab at linux-m68k dot org @ 2010-01-11  9:58 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From schwab at linux-m68k dot org  2010-01-11 09:58 -------
Use a separate regex_t object in each thread.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
  2010-01-11  9:58 ` [Bug regex/11159] " schwab at linux-m68k dot org
@ 2010-01-11 17:46 ` extproxy at gmail dot com
  2010-01-11 17:55 ` bonzini at gnu dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: extproxy at gmail dot com @ 2010-01-11 17:46 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From extproxy at gmail dot com  2010-01-11 17:45 -------

> Use a separate regex_t object in each thread.

Why is that ? The regexec() interface takes in a 'const regex_t *' object. This 
implies multiple threads can use the same object. 

In my program, all threads work with the same regular expression. So why use a 
different regex_t object ?

At the very least, the regexec() documentation needs to clarify this performance 
limitation. I'm re-opening this bug - please change title if necessary to a doc 
bug if you still don't agree that this should be fixed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
  2010-01-11  9:58 ` [Bug regex/11159] " schwab at linux-m68k dot org
  2010-01-11 17:46 ` extproxy at gmail dot com
@ 2010-01-11 17:55 ` bonzini at gnu dot org
  2010-01-11 18:01 ` extproxy at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2010-01-11 17:55 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2010-01-11 17:55 -------
The fact that is "const" does not mean that no internal data structures are
modified (and this needs locking).  C++ even has a "mutable" keyword for this. 
glibc does the locking per-regex_t.

Could be a doc bug, leaving this decision to the glibc maintainers.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
                   ` (2 preceding siblings ...)
  2010-01-11 17:55 ` bonzini at gnu dot org
@ 2010-01-11 18:01 ` extproxy at gmail dot com
  2010-01-11 18:03 ` bonzini at gnu dot org
  2010-01-15  7:51 ` drepper at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: extproxy at gmail dot com @ 2010-01-11 18:01 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From extproxy at gmail dot com  2010-01-11 18:01 -------

Out of curiosity, what exactly is regex_t locking ?


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
                   ` (3 preceding siblings ...)
  2010-01-11 18:01 ` extproxy at gmail dot com
@ 2010-01-11 18:03 ` bonzini at gnu dot org
  2010-01-15  7:51 ` drepper at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2010-01-11 18:03 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From bonzini at gnu dot org  2010-01-11 18:03 -------
regexec converts the NFA to DFA on demand, so the DFA representation is locked
(and some more stuff too, but TLS could indeed be used for that because it is
per-match data; DFA states are persistent).

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug regex/11159] lock contention within regexec() when used from multiple threads
  2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
                   ` (4 preceding siblings ...)
  2010-01-11 18:03 ` bonzini at gnu dot org
@ 2010-01-15  7:51 ` drepper at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: drepper at redhat dot com @ 2010-01-15  7:51 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2010-01-15 07:50 -------
No, you cannot use TLS.  The semantics is that using a regex_t in one thread
after the other the side effects are carried over.

If you know this isn't needed, use separate regex_t.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |WONTFIX


http://sourceware.org/bugzilla/show_bug.cgi?id=11159

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-01-15  7:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-11  9:40 [Bug regex/11159] New: lock contention within regexec() when used from multiple threads extproxy at gmail dot com
2010-01-11  9:58 ` [Bug regex/11159] " schwab at linux-m68k dot org
2010-01-11 17:46 ` extproxy at gmail dot com
2010-01-11 17:55 ` bonzini at gnu dot org
2010-01-11 18:01 ` extproxy at gmail dot com
2010-01-11 18:03 ` bonzini at gnu dot org
2010-01-15  7:51 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).