public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/17069] New: leak in regcomp
@ 2014-06-19  5:53 konstantin.s.serebryany at gmail dot com
  2014-06-19 14:46 ` [Bug regex/17069] " fweimer at redhat dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2014-06-19  5:53 UTC (permalink / raw)
  To: glibc-bugs-regex

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 6072 bytes --]

https://sourceware.org/bugzilla/show_bug.cgi?id=17069

            Bug ID: 17069
           Summary: leak in regcomp
           Product: glibc
           Version: 2.20
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
          Assignee: unassigned at sourceware dot org
          Reporter: konstantin.s.serebryany at gmail dot com
                CC: drepper.fsp at gmail dot com

regcomp has a memory leak. Present in ancient 2.15 and in fresh trunk.

clang -fsanitize=address -g ./r.c && ASAN_OPTIONS=fast_unwind_on_malloc=0
./a.out 1


==1371==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x4943d9 in calloc
    #1 0x7fa25a3e57c3 in parse_bracket_exp glibc-trunk/posix/regcomp.c:3045
    #2 0x7fa25a3e57c3 in parse_expression glibc-trunk/posix/regcomp.c:2265
    #3 0x7fa25a3e9181 in parse_branch glibc-trunk/posix/regcomp.c:2193
    #4 0x7fa25a3e9408 in parse_reg_exp glibc-trunk/posix/regcomp.c:2145
    #5 0x7fa25a3ea156 in parse glibc-trunk/posix/regcomp.c:2114
    #6 0x7fa25a3ea156 in re_compile_internal glibc-trunk/posix/regcomp.c:794
    #7 0x7fa25a3ece0f in __regcomp glibc-trunk/posix/regcomp.c:501
    #8 0x4b2c6e in main r.c:8

Valgrind sees it too: 
gcc -std=c99 -g ./r.c && valgrind --leak-check=full ./a.out 1
==1895== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1895==    at 0x4C2B1B8: calloc (vg_replace_malloc.c:618)
==1895==    by 0x4F145DE: parse_expression (regcomp.c:3057)
==1895==    by 0x4F1246F: parse_branch (regcomp.c:2170)
==1895==    by 0x4F127BD: parse_reg_exp (regcomp.c:2122)
==1895==    by 0x4F12CBF: re_compile_internal (regcomp.c:2091)
==1895==    by 0x4F16E7E: regcomp (regcomp.c:506)
==1895==    by 0x4005E9: main (r.c:8)

Running this test with large number of iterations you can see the leak in 'top'


#include <regex.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  long n = argc == 2 ? atol(argv[1]) : 1;
  for (long i = 0; i < n; i++) {
    regex_t r;
    regcomp(&r, "[^[][:alpha:][:up[^perword:]\\{-2(?<!27,}�\\p.o\n"
               
"]�����+)][:x[digit:]]\\P{^Gothic}{-109,}^{235}NNNN{214,}{-83}\\z\\w", 0);
    regfree(&r);
  } 
}   

Found with the help of regfuzz

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-regex-return-607-listarch-glibc-bugs-regex=sources.redhat.com@sourceware.org Thu Jun 19 07:45:44 2014
Return-Path: <glibc-bugs-regex-return-607-listarch-glibc-bugs-regex=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs-regex@sources.redhat.com
Received: (qmail 15040 invoked by alias); 19 Jun 2014 07:45:43 -0000
Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs-regex.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-regex-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs-regex@sourceware.org>
List-Help: <mailto:glibc-bugs-regex-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-regex-owner@sourceware.org
Delivered-To: mailing list glibc-bugs-regex@sourceware.org
Received: (qmail 15014 invoked by uid 48); 19 Jun 2014 07:45:40 -0000
From: "konstantin.s.serebryany at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs-regex@sourceware.org
Subject: [Bug regex/17070] New: regcomp with REG_EXTENDED uses unbounded CPU or RAM
Date: Thu, 19 Jun 2014 07:45:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: regex
X-Bugzilla-Version: 2.20
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: konstantin.s.serebryany at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc
Message-ID: <bug-17070-132@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-06/txt/msg00011.txt.bz2
Content-length: 2027

https://sourceware.org/bugzilla/show_bug.cgi?id\x17070

            Bug ID: 17070
           Summary: regcomp with REG_EXTENDED uses unbounded CPU or RAM
           Product: glibc
           Version: 2.20
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
          Assignee: unassigned at sourceware dot org
          Reporter: konstantin.s.serebryany at gmail dot com
                CC: drepper.fsp at gmail dot com

[not sure how useful these reports are, but filing just in case.]


#include <regex.h>
int main(int argc, char **argv) {
  regex_t r;
  regcomp(&r,
#if 1
"([\\u]\\N|||){85,}[:ascii:]l[:(?!graph:]x?x)",
#else
"[(?{x<]})x{146}{,78}{,154}{,211}\\P{(?>^Latin}"
"x\\w\\p{^So}\\P{Alphabetic}[:punct:]\\P{^Mc}xxx)"
"[:alnum:]{-9,}[:blankcntrl:][:upperword:][:punct:]\\e",
#endif
          REG_EXTENDED);
  regfree(&r);
}

% gcc r1.c && ./a.out

The first pattern just never ends, most of the time is spent
in deep recursive call to calc_eclosure_iter

The second case is much worse -- it quickly eats all available RAM on the
machine,
doing tons of allocations here:
#1  0x00007ffff7a9cf95 in __GI___libc_malloc (bytes–8) at malloc.c:2924
#2  0x00007ffff7af1e3b in create_token_tree
#3  duplicate_tree
#4  0x00007ffff7af7f6f in parse_dup_op
#5  parse_expression
#6  0x00007ffff7af6470 in parse_branch
#7  0x00007ffff7af67be in parse_reg_exp
#8  0x00007ffff7af6cc0 in parse
#9  re_compile_internal


Checked with 2.15 and fresh trunk, tests were generated by regfuzz

--
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-02-18 14:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19  5:53 [Bug regex/17069] New: leak in regcomp konstantin.s.serebryany at gmail dot com
2014-06-19 14:46 ` [Bug regex/17069] " fweimer at redhat dot com
2014-06-19 17:01 ` cvs-commit at gcc dot gnu.org
2014-06-19 20:35 ` schwab@linux-m68k.org
2014-06-20  4:08 ` konstantin.s.serebryany at gmail dot com
2014-06-20  4:09 ` konstantin.s.serebryany at gmail dot com
2014-06-20  4:41 ` konstantin.s.serebryany at gmail dot com
2014-06-20 12:04 ` cvs-commit at gcc dot gnu.org
2014-06-20 12:06 ` schwab@linux-m68k.org
2014-06-20 12:28 ` konstantin.s.serebryany at gmail dot com
2014-06-22  7:46 ` cvs-commit at gcc dot gnu.org
2014-08-28 10:26 ` cvs-commit at gcc dot gnu.org
2015-02-18 14:31 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).