public inbox for glibc-bugs-regex@sourceware.org help / color / mirror / Atom feed
From: "steve98 at gmail dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/25934] New: re_token_t.mb_partial used before initialization Date: Thu, 07 May 2020 00:49:10 +0000 [thread overview] Message-ID: <bug-25934-132@http.sourceware.org/bugzilla/> (raw) https://sourceware.org/bugzilla/show_bug.cgi?id=25934 Bug ID: 25934 Summary: re_token_t.mb_partial used before initialization Product: glibc Version: 2.27 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regex Assignee: unassigned at sourceware dot org Reporter: steve98 at gmail dot com CC: drepper.fsp at gmail dot com Target Milestone: --- I was debugging my own program the the Valgrind/Memcheck tool, and discovered a case of regex using a variable without first initializing it. The offending code is in regcomp.c near line 328: *p++ = dfa->nodes[node].opr.c; while (++node < dfa->nodes_len && dfa->nodes[node].type == CHARACTER && dfa->nodes[node].mb_partial) *p++ = dfa->nodes[node].opr.c; The Valgrind/Memcheck reported the problem as: ==31536== Conditional jump or move depends on uninitialised value(s) ==31536== at 0x56F213D: re_compile_fastmap_iter.isra.26 (regcomp.c:328) ==31536== by 0x57023F0: __re_compile_fastmap (regcomp.c:282) ==31536== by 0x57023F0: regcomp (regcomp.c:509) ==31536== by 0x126EEF: regex_match (xxx_xxx.c:290) (The rest, plus the line above, is from our own code) The tool also reported where/how such a variable was allocated (from the heap, not the stack) ==31536== Uninitialised value was created by a heap allocation ==31536== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==31536== by 0x56F2D6A: create_token_tree.isra.14.constprop.39 (regcomp.c:3749) ==31536== by 0x56FB133: parse_bracket_exp (regcomp.c:3299) ==31536== by 0x56FB133: parse_expression (regcomp.c:2262) ==31536== by 0x56FB519: parse_branch (regcomp.c:2190) ==31536== by 0x56FB68B: parse_reg_exp (regcomp.c:2138) ==31536== by 0x56FBD7C: parse (regcomp.c:2107) ==31536== by 0x56FBD7C: re_compile_internal (regcomp.c:788) ==31536== by 0x5702331: regcomp (regcomp.c:498) ==31536== by 0x126EEF: regex_match (xxx_xxx.c:290) I reviewed the related code, and saw that the "mb_partial" field is part of the following structure (in regex_internal.h): typedef struct { union { unsigned char c; /* for CHARACTER */ re_bitset_ptr_t sbcset; /* for SIMPLE_BRACKET */ #ifdef RE_ENABLE_I18N re_charset_t *mbcset; /* for COMPLEX_BRACKET */ #endif /* RE_ENABLE_I18N */ Idx idx; /* for BACK_REF */ re_context_type ctx_type; /* for ANCHOR */ } opr; #if __GNUC__ >= 2 && !defined __STRICT_ANSI__ re_token_type_t type : 8; #else re_token_type_t type; #endif unsigned int constraint : 10; /* context constraint */ unsigned int duplicated : 1; unsigned int opt_subexp : 1; #ifdef RE_ENABLE_I18N unsigned int accept_mb : 1; /* These 2 bits can be moved into the union if needed (e.g. if running out of bits; move opr.c to opr.c.c and move the flags to opr.c.flags). */ unsigned int mb_partial : 1; #endif unsigned int word_char : 1; } re_token_t; I then compared the write operation of "mb_partial" against others, such as the near-by "accept_mb" member, and saw that indeed there are times that the "accept_mb" field is set, but the "mb_partial" field is not. For example (in regex_internal.c, near line 1450): dfa->nodes[dfa->nodes_len].constraint = 0; #ifdef RE_ENABLE_I18N dfa->nodes[dfa->nodes_len].accept_mb = ((token.type == OP_PERIOD && dfa->mb_cur_max > 1) || token.type == COMPLEX_BRACKET); #endif The code above seems to be performing initialization for the newly allocated "node", but clearly missed the mb_partial field. My problem occurred with 2.27, but I also read through the latest code, and it seems that mb_partial is not being initialized in additional places. So the problem should still exist in the latest version. If you need me to validate the problem against the latest glibc version, or provide a program to demonstrate this problem with Valgrind/Memcheck, I'll be happy to do so. -- You are receiving this mail because: You are on the CC list for the bug.
next reply other threads:[~2020-05-07 0:49 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-05-07 0:49 steve98 at gmail dot com [this message] 2020-05-07 6:07 ` [Bug regex/25934] " steve98 at gmail dot com 2020-05-07 6:37 ` sangshuduo at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-25934-132@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs-regex@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).