public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* Regex memory management
@ 2002-10-30  7:56 Jakub Jelinek
  2002-11-02 16:25 ` Roland McGrath
  2002-11-02 16:26 ` Roland McGrath
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Jelinek @ 2002-10-30  7:56 UTC (permalink / raw)
  To: Isamu Hasegawa; +Cc: Glibc hackers

Hi!

I wonder whether it wouldn't be a win to switch to using obstacks for regex
(one created by regcomp and destroyed in regfree, one created at start
of regexec and destroyed when it is done searching).
Looking at typical regex malloc/free patterns (just read bug-regex2.mtrace
or bug-regex14.mtrace), it is something like ~ 40 mallocs followed
by ~15 free calls (regcomp), then a couple of (~ 12 mallocs followed by
~ 7 frees of these; during first regexec) and then always a one or more
mallocs followed by one or more frees for each remaining regexec.

From this, I think it could be a win to have 2 different obstacks, one
permanent, inited at regcomp start, freed in regfree, the other would be
temporary, inited at regcomp start and freed at regcomp end and inited
at regexec start, freed at regexec end.

The obstack_chunk_alloc routine would have to either longjmp
to regcomp/regexec cleanup handler, or when glibc starts to use try/finally
it could throw exception. The temporary allocation obstacks might even use
alloca for the initial (say 2K) buffer - the only user of big malloc chunk
seems to be build_trtable.

Attached is just a patch which changes big temporary allocations
in build_trtable from set of 5 malloc calls to 2 alloca calls and adds some
ENOMEM handling. group_nodes_into_DFAstates can still leak if malloc returns
NULL and I haven't checked other routines.

2002-10-30  Jakub Jelinek  <jakub@redhat.com>

	* posix/regexec.c (build_trtable): Alloca or malloc dests_node and
	dests_ch arrays together.  Alloca or malloc dest_states,
	dest_states_word and dest_states_nl arrays together.  Free memory on
	error exit.

--- libc/posix/regexec.c.jj	2002-10-21 19:14:52.000000000 +0200
+++ libc/posix/regexec.c	2002-10-30 15:41:50.000000000 +0100
@@ -2469,8 +2469,10 @@ build_trtable (preg, state, fl_search)
   reg_errcode_t err;
   re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
   int i, j, k, ch;
+  int dests_node_malloced = 0, dest_states_malloced = 0;
   int ndests; /* Number of the destination states from `state'.  */
-  re_dfastate_t **trtable, **dest_states, **dest_states_word, **dest_states_nl;
+  re_dfastate_t **trtable;
+  re_dfastate_t **dest_states = NULL, **dest_states_word, **dest_states_nl;
   re_node_set follows, *dests_node;
   bitset *dests_ch;
   bitset acceptable;
@@ -2479,34 +2481,76 @@ build_trtable (preg, state, fl_search)
      from `state'.  `dests_node[i]' represents the nodes which i-th
      destination state contains, and `dests_ch[i]' represents the
      characters which i-th destination state accepts.  */
-  dests_node = re_malloc (re_node_set, SBC_MAX);
-  dests_ch = re_malloc (bitset, SBC_MAX);
+#ifdef _LIBC
+  if (__libc_use_alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX))
+    dests_node = (re_node_set *)
+		 alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX);
+  else
+#endif
+    {
+      dests_node = (re_node_set *)
+		   malloc ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX);
+      if (BE (dests_node == NULL, 0))
+	return NULL;
+      dests_node_malloced = 1;
+    }
+  dests_ch = (bitset *) (dests_node + SBC_MAX);
 
   /* Initialize transiton table.  */
   trtable = (re_dfastate_t **) calloc (sizeof (re_dfastate_t *), SBC_MAX);
-  if (BE (dests_node == NULL || dests_ch == NULL || trtable == NULL, 0))
-    return NULL;
+  if (BE (trtable == NULL, 0))
+    {
+      if (dests_node_malloced)
+	free (dests_node);
+      return NULL;
+    }
 
   /* At first, group all nodes belonging to `state' into several
      destinations.  */
   ndests = group_nodes_into_DFAstates (preg, state, dests_node, dests_ch);
   if (BE (ndests <= 0, 0))
     {
-      re_free (dests_node);
-      re_free (dests_ch);
+      if (dests_node_malloced)
+	free (dests_node);
       /* Return NULL in case of an error, trtable otherwise.  */
-      return (ndests < 0) ? NULL : trtable;
+      if (ndests == 0)
+	return trtable;
+      free (trtable);
+      return NULL;
     }
 
-  dest_states = re_malloc (re_dfastate_t *, ndests);
-  dest_states_word = re_malloc (re_dfastate_t *, ndests);
-  dest_states_nl = re_malloc (re_dfastate_t *, ndests);
-  bitset_empty (acceptable);
-
   err = re_node_set_alloc (&follows, ndests + 1);
-  if (BE (dest_states == NULL || dest_states_word == NULL
-          || dest_states_nl == NULL || err != REG_NOERROR, 0))
-    return NULL;
+  if (BE (err != REG_NOERROR, 0))
+    goto out_free;
+
+#ifdef _LIBC
+  if (__libc_use_alloca ((sizeof (re_node_set) + sizeof (bitset)) * SBC_MAX
+			 + ndests * 3 * sizeof (re_dfastate_t *)))
+    dest_states = (re_dfastate_t **)
+		  alloca (ndests * 3 * sizeof (re_dfastate_t *));
+  else
+#endif
+    {
+      dest_states = (re_dfastate_t **)
+		    malloc (ndests * 3 * sizeof (re_dfastate_t *));
+      if (BE (dest_states == NULL, 0))
+	{
+out_free:
+	  if (dest_states_malloced)
+	    free (dest_states);
+	  re_node_set_free (&follows);
+	  for (i = 0; i < ndests; ++i)
+	    re_node_set_free (dests_node + i);
+	  free (trtable);
+	  if (dests_node_malloced)
+	    free (dests_node);
+	  return NULL;
+	}
+      dest_states_malloced = 1;
+    }
+  dest_states_word = dest_states + ndests;
+  dest_states_nl = dest_states_word + ndests;
+  bitset_empty (acceptable);
 
   /* Then build the states for all destinations.  */
   for (i = 0; i < ndests; ++i)
@@ -2521,7 +2565,7 @@ build_trtable (preg, state, fl_search)
             {
               err = re_node_set_merge (&follows, dfa->eclosures + next_node);
               if (BE (err != REG_NOERROR, 0))
-                return NULL;
+		goto out_free;
             }
         }
       /* If search flag is set, merge the initial state.  */
@@ -2541,12 +2585,12 @@ build_trtable (preg, state, fl_search)
               err = re_node_set_merge (&follows,
                                        dfa->init_state->entrance_nodes);
               if (BE (err != REG_NOERROR, 0))
-                return NULL;
+                goto out_free;
             }
         }
       dest_states[i] = re_acquire_state_context (&err, dfa, &follows, 0);
       if (BE (dest_states[i] == NULL && err != REG_NOERROR, 0))
-        return NULL;
+        goto out_free;
       /* If the new state has context constraint,
          build appropriate states for these contexts.  */
       if (dest_states[i]->has_constraint)
@@ -2554,11 +2598,11 @@ build_trtable (preg, state, fl_search)
           dest_states_word[i] = re_acquire_state_context (&err, dfa, &follows,
                                                           CONTEXT_WORD);
           if (BE (dest_states_word[i] == NULL && err != REG_NOERROR, 0))
-            return NULL;
+            goto out_free;
           dest_states_nl[i] = re_acquire_state_context (&err, dfa, &follows,
                                                         CONTEXT_NEWLINE);
           if (BE (dest_states_nl[i] == NULL && err != REG_NOERROR, 0))
-            return NULL;
+            goto out_free;
         }
       else
         {
@@ -2615,16 +2659,15 @@ build_trtable (preg, state, fl_search)
           }
     }
 
-  re_free (dest_states_nl);
-  re_free (dest_states_word);
-  re_free (dest_states);
+  if (dest_states_malloced)
+    free (dest_states);
 
   re_node_set_free (&follows);
   for (i = 0; i < ndests; ++i)
     re_node_set_free (dests_node + i);
 
-  re_free (dests_ch);
-  re_free (dests_node);
+  if (dests_node_malloced)
+    free (dests_node);
 
   return trtable;
 }


	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regex memory management
  2002-10-30  7:56 Regex memory management Jakub Jelinek
@ 2002-11-02 16:25 ` Roland McGrath
  2002-11-02 21:52   ` Jakub Jelinek
  2002-11-02 16:26 ` Roland McGrath
  1 sibling, 1 reply; 4+ messages in thread
From: Roland McGrath @ 2002-11-02 16:25 UTC (permalink / raw)
  To: Isamu Hasegawa; +Cc: Jakub Jelinek <jakub@redhat.com>     Glibc hackers

We really should have the regex code audited for possible memory leaks in
error conditions, since Jakub has already observed some.  If Isamu doesn't
have time, perhaps someone else can find time to do it soon.  Jakub's
obstack suggestions sound good to me, and converting to that makes it
vastly easier to be sure the code is correct.  But I don't know the regex
code and its allocation properties at all, so as to say that the obstack
model fits acceptably right in all cases.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regex memory management
  2002-10-30  7:56 Regex memory management Jakub Jelinek
  2002-11-02 16:25 ` Roland McGrath
@ 2002-11-02 16:26 ` Roland McGrath
  1 sibling, 0 replies; 4+ messages in thread
From: Roland McGrath @ 2002-11-02 16:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Isamu Hasegawa, Glibc hackers

I put this patch in, since it is clearly correct regardless of
regex-specific questions.  I'd like Isamu to respond on the general
questions about allocation in regex.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regex memory management
  2002-11-02 16:25 ` Roland McGrath
@ 2002-11-02 21:52   ` Jakub Jelinek
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Jelinek @ 2002-11-02 21:52 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Isamu Hasegawa, Glibc hackers

On Sat, Nov 02, 2002 at 04:25:09PM -0800, Roland McGrath wrote:
> We really should have the regex code audited for possible memory leaks in
> error conditions, since Jakub has already observed some.  If Isamu doesn't
> have time, perhaps someone else can find time to do it soon.  Jakub's
> obstack suggestions sound good to me, and converting to that makes it
> vastly easier to be sure the code is correct.  But I don't know the regex
> code and its allocation properties at all, so as to say that the obstack
> model fits acceptably right in all cases.

I can try to convert it to use obstacks, but would like to hear from Isamu
about it first...

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-11-03  5:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-30  7:56 Regex memory management Jakub Jelinek
2002-11-02 16:25 ` Roland McGrath
2002-11-02 21:52   ` Jakub Jelinek
2002-11-02 16:26 ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).