From: Jakub Jelinek <jakub@redhat.com>
To: Isamu Hasegawa <isamu@yamato.ibm.com>,
Ulrich Drepper <drepper@redhat.com>,
Roland McGrath <roland@redhat.com>
Cc: Glibc hackers <libc-hacker@sources.redhat.com>
Subject: re_string bugs
Date: Wed, 06 Nov 2002 08:45:00 -0000 [thread overview]
Message-ID: <20021106174459.N3451@sunsite.ms.mff.cuni.cz> (raw)
Hi!
There is at least one more use of unitialized data, which may even crash:
tip_context handling.
Can be seen e.g. on Daniel's testcase:
#include <sys/types.h>
#include <regex.h>
int main()
{
regex_t reg;
regmatch_t pm[1];
regcomp (®, "man", REG_ICASE);
return regexec (®, "pipenightdreams", 1, pm, 0);
}
Here, re_search_internal calls re_string_allocate with len = 15 and
init_len = 5.
Then the loop in it (doesn't matter if without my today's patch or with it)
skips everything until "ms" at the end, thus match_first is 13 and
re_string_reconstruct is called on it.
re_string_reconstruct calls:
pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags,
newline);
but mbs[12] is well beyond pstr->valid_len, it is well beyond pstr->buf_len
even, so if unlucky could as well crash, certainly tip_context will be set
incorrectly.
This works only in regexec style searching (ie. start 0, range positive)
and matching if MBS ICASE or MBS translate and input_len for pstr is
bigger than MBS_CUR_MAX, or if mbs points into raw_mbs (ie. non-MBS
no-ICASE no translate).
Backward searching or increasing offset by more than buf_len is broken.
For backwards searching, I'm afraid we need to check last MB_CUR_MAX
chars before raw_mbs + raw_mbs_idx and see what the last multibyte char is.
For UTF-8 this is trivial, just search backwards for first byte with top bit
clear, but for other charsets it may be more difficult.
Another thing I'm not sure is re_string_context_at implementation if MBS:
Assuming all supported MBS locales have newline single byte '\n',
there is IMHO problem with
#define IS_WORD_CHAR(ch) (isalnum (ch) || (ch) == '_')
c = re_string_byte_at (input, idx);
if (IS_WORD_CHAR (c))
return CONTEXT_WORD;
Shouldn't this use re_string_wchar_at and iswalnum for MBS locales?
Jakub
reply other threads:[~2002-11-06 16:45 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021106174459.N3451@sunsite.ms.mff.cuni.cz \
--to=jakub@redhat.com \
--cc=drepper@redhat.com \
--cc=isamu@yamato.ibm.com \
--cc=libc-hacker@sources.redhat.com \
--cc=roland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).