* [PATCH] Fix bug-regex20.c
@ 2003-11-25 19:19 Jakub Jelinek
2003-11-26 7:25 ` Ulrich Drepper
0 siblings, 1 reply; 2+ messages in thread
From: Jakub Jelinek @ 2003-11-25 19:19 UTC (permalink / raw)
To: Ulrich Drepper, Roland McGrath; +Cc: Glibc hackers
Hi!
re_string_reconstruct relied on bufs_len >= pstr->mb_cur_max.
Also my UTF-8 re_string_reconstruct optimization unnecessarily didn't
handle idx in the middle of UTF-8 character and made a fallback to
the expensive re_string_skip_chars.
With this patch, all glibc regex tests pass even with
LD_PRELOAD=libefence.so.0.
2003-11-25 Jakub Jelinek <jakub@redhat.com>
* posix/regex_internal.c (re_string_allocate): Make sure init_len
is at least dfa->mb_cur_max.
(re_string_reconstruct): If is_utf8, don't fall back into
re_string_skip_chars just because idx points into a middle of
valid UTF-8 character. Instead, set the wcs bytes which correspond
to the partial character bytes to WEOF.
* posix/regexec.c (re_search_internal): Allocate input.bufs_len + 1
instead of dfa->nodes_len + 1 state_log entries initially.
* posix/bug-regex20.c (main): Uncomment backwards case insensitive
tests.
--- libc/posix/bug-regex20.c.jj 2003-11-19 10:24:00.000000000 +0100
+++ libc/posix/bug-regex20.c 2003-11-25 01:37:43.000000000 +0100
@@ -271,7 +271,6 @@ main (void)
continue;
}
- /* XXX: This causes regex segfault. Disable for now.
res = re_search (®buf, tests[i].string, str_len, str_len, -str_len,
NULL);
if (res != tests[i].res)
@@ -280,7 +279,7 @@ main (void)
ret = 1;
regfree (®buf);
continue;
- } */
+ }
regfree (®buf);
}
--- libc/posix/regexec.c.jj 2003-11-24 23:49:53.000000000 +0100
+++ libc/posix/regexec.c 2003-11-25 13:06:02.000000000 +0100
@@ -620,7 +620,7 @@ re_search_internal (preg, string, length
multi character collating element. */
if (nmatch > 1 || dfa->has_mb_node)
{
- mctx.state_log = re_malloc (re_dfastate_t *, dfa->nodes_len + 1);
+ mctx.state_log = re_malloc (re_dfastate_t *, input.bufs_len + 1);
if (BE (mctx.state_log == NULL, 0))
{
err = REG_ESPACE;
--- libc/posix/regex_internal.c.jj 2003-11-24 09:54:20.000000000 +0100
+++ libc/posix/regex_internal.c 2003-11-25 13:26:45.000000000 +0100
@@ -55,7 +55,12 @@ re_string_allocate (pstr, str, len, init
const re_dfa_t *dfa;
{
reg_errcode_t ret;
- int init_buf_len = (len + 1 < init_len) ? len + 1: init_len;
+ int init_buf_len;
+
+ /* Ensure at least one character fits into the buffers. */
+ if (init_len < dfa->mb_cur_max)
+ init_len = dfa->mb_cur_max;
+ init_buf_len = (len + 1 < init_len) ? len + 1: init_len;
re_string_construct_common (str, len, pstr, trans, icase, dfa);
pstr->stop = pstr->len;
@@ -516,33 +521,33 @@ re_string_reconstruct (pstr, idx, eflags
/* Special case UTF-8. Multi-byte chars start with any
byte other than 0x80 - 0xbf. */
raw = pstr->raw_mbs + pstr->raw_mbs_idx;
- end = raw + (pstr->valid_len > offset - pstr->mb_cur_max
- ? pstr->valid_len : offset - pstr->mb_cur_max);
+ end = raw + (offset - pstr->mb_cur_max);
for (p = raw + offset - 1; p >= end; --p)
if ((*p & 0xc0) != 0x80)
{
mbstate_t cur_state;
wchar_t wc2;
+ int mlen;
/* XXX Don't use mbrtowc, we know which conversion
to use (UTF-8 -> UCS4). */
memset (&cur_state, 0, sizeof (cur_state));
- if (mbrtowc (&wc2, p, raw + offset - p, &cur_state)
- == raw + offset - p)
+ mlen = mbrtowc (&wc2, p, raw + pstr->len - p,
+ &cur_state) - (raw + offset - p);
+ if (mlen >= 0)
{
memset (&pstr->cur_state, '\0',
sizeof (mbstate_t));
+ pstr->valid_len = mlen;
wc = wc2;
}
break;
}
}
if (wc == WEOF)
- {
- pstr->valid_len = re_string_skip_chars (pstr, idx, &wc) - idx;
- for (wcs_idx = 0; wcs_idx < pstr->valid_len; ++wcs_idx)
- pstr->wcs[wcs_idx] = WEOF;
- }
+ pstr->valid_len = re_string_skip_chars (pstr, idx, &wc) - idx;
+ for (wcs_idx = 0; wcs_idx < pstr->valid_len; ++wcs_idx)
+ pstr->wcs[wcs_idx] = WEOF;
if (pstr->trans && wc <= 0xff)
wc = pstr->trans[wc];
pstr->tip_context = (IS_WIDE_WORD_CHAR (wc) ? CONTEXT_WORD
Jakub
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Fix bug-regex20.c
2003-11-25 19:19 [PATCH] Fix bug-regex20.c Jakub Jelinek
@ 2003-11-26 7:25 ` Ulrich Drepper
0 siblings, 0 replies; 2+ messages in thread
From: Ulrich Drepper @ 2003-11-26 7:25 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Glibc hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Thanks, I've applied the two regex patches.
- --
⧠Ulrich Drepper ⧠Red Hat, Inc. ⧠444 Castro St ⧠Mountain View, CA â
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/xBzX2ijCOnn/RHQRAu6oAKDLyFhfPhB+mayKUiG28sEgUhTyFACguvW0
l1z/ZIv/d2CncgcIGmcfMGc=
=Mopz
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-11-26 3:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-25 19:19 [PATCH] Fix bug-regex20.c Jakub Jelinek
2003-11-26 7:25 ` Ulrich Drepper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).