From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 74D72394844B; Thu, 7 May 2020 00:49:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 74D72394844B From: "steve98 at gmail dot com" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/25934] New: re_token_t.mb_partial used before initialization Date: Thu, 07 May 2020 00:49:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: steve98 at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs-regex@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs-regex mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2020 00:49:10 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25934 Bug ID: 25934 Summary: re_token_t.mb_partial used before initialization Product: glibc Version: 2.27 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regex Assignee: unassigned at sourceware dot org Reporter: steve98 at gmail dot com CC: drepper.fsp at gmail dot com Target Milestone: --- I was debugging my own program the the Valgrind/Memcheck tool, and discover= ed a case of regex using a variable without first initializing it. The offending code is in regcomp.c near line 328: *p++ =3D dfa->nodes[node].opr.c; while (++node < dfa->nodes_len && dfa->nodes[node].type =3D=3D CHARACTER && dfa->nodes[node].mb_partial) *p++ =3D dfa->nodes[node].opr.c; The Valgrind/Memcheck reported the problem as: =3D=3D31536=3D=3D Conditional jump or move depends on uninitialised value(s) =3D=3D31536=3D=3D at 0x56F213D: re_compile_fastmap_iter.isra.26 (regcomp= .c:328) =3D=3D31536=3D=3D by 0x57023F0: __re_compile_fastmap (regcomp.c:282) =3D=3D31536=3D=3D by 0x57023F0: regcomp (regcomp.c:509) =3D=3D31536=3D=3D by 0x126EEF: regex_match (xxx_xxx.c:290) (The rest, plus the line above, is from our own code) The tool also reported where/how such a variable was allocated (from the he= ap, not the stack) =3D=3D31536=3D=3D Uninitialised value was created by a heap allocation =3D=3D31536=3D=3D at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) =3D=3D31536=3D=3D by 0x56F2D6A: create_token_tree.isra.14.constprop.39 (regcomp.c:3749) =3D=3D31536=3D=3D by 0x56FB133: parse_bracket_exp (regcomp.c:3299) =3D=3D31536=3D=3D by 0x56FB133: parse_expression (regcomp.c:2262) =3D=3D31536=3D=3D by 0x56FB519: parse_branch (regcomp.c:2190) =3D=3D31536=3D=3D by 0x56FB68B: parse_reg_exp (regcomp.c:2138) =3D=3D31536=3D=3D by 0x56FBD7C: parse (regcomp.c:2107) =3D=3D31536=3D=3D by 0x56FBD7C: re_compile_internal (regcomp.c:788) =3D=3D31536=3D=3D by 0x5702331: regcomp (regcomp.c:498) =3D=3D31536=3D=3D by 0x126EEF: regex_match (xxx_xxx.c:290) I reviewed the related code, and saw that the "mb_partial" field is part of= the following structure (in regex_internal.h): typedef struct { union { unsigned char c; /* for CHARACTER */ re_bitset_ptr_t sbcset; /* for SIMPLE_BRACKET */ #ifdef RE_ENABLE_I18N re_charset_t *mbcset; /* for COMPLEX_BRACKET */ #endif /* RE_ENABLE_I18N */ Idx idx; /* for BACK_REF */ re_context_type ctx_type; /* for ANCHOR */ } opr; #if __GNUC__ >=3D 2 && !defined __STRICT_ANSI__ re_token_type_t type : 8; #else re_token_type_t type; #endif unsigned int constraint : 10; /* context constraint */ unsigned int duplicated : 1; unsigned int opt_subexp : 1; #ifdef RE_ENABLE_I18N unsigned int accept_mb : 1; /* These 2 bits can be moved into the union if needed (e.g. if running out of bits; move opr.c to opr.c.c and move the flags to opr.c.flags). */ unsigned int mb_partial : 1; #endif unsigned int word_char : 1; } re_token_t; I then compared the write operation of "mb_partial" against others, such as= the near-by "accept_mb" member, and saw that indeed there are times that the "accept_mb" field is set, but the "mb_partial" field is not. For example (in regex_internal.c, near line 1450): dfa->nodes[dfa->nodes_len].constraint =3D 0; #ifdef RE_ENABLE_I18N dfa->nodes[dfa->nodes_len].accept_mb =3D ((token.type =3D=3D OP_PERIOD && dfa->mb_cur_max > 1) || token.type =3D=3D COMPLEX_BRACKET); #endif The code above seems to be performing initialization for the newly allocated "node", but clearly missed the mb_partial field. My problem occurred with 2.27, but I also read through the latest code, and= it seems that mb_partial is not being initialized in additional places. So the problem should still exist in the latest version. If you need me to validate the problem against the latest glibc version, or provide a program to demonstrate this problem with Valgrind/Memcheck, I'll = be happy to do so. --=20 You are receiving this mail because: You are on the CC list for the bug.=