* 'trusted and reasonably structured' regular expressions @ 2021-02-24 4:53 Siddhesh Poyarekar 2021-02-24 18:16 ` Paul Eggert 2021-02-24 22:37 ` Joseph Myers 0 siblings, 2 replies; 14+ messages in thread From: Siddhesh Poyarekar @ 2021-02-24 4:53 UTC (permalink / raw) To: Siddhesh Poyarekar via Libc-alpha; +Cc: Florian Weimer, Carlos O'Donell Hi, The security exceptions[1] wiki page mentions that buffer overflows and other similar bugs will be treated as security vulnerabilities provided (well it actually says 'assuming' but I interpreted it as 'provided') that the offending pattern is 'trusted and reasonably well structured'. It appears to offload the responsibility of sanitizing regular expressions to the caller, which seems like the responsibility of the regular expression library. We have not rejected CVEs on this basis in the past[2] but it seems like something we should remedy as a policy. It seems unreasonable to me to expect users to parse regular expressions to validate them before passing them on to the regular expression parser library. Thoughts? Siddhesh [1] https://sourceware.org/glibc/wiki/Security%20Exceptions [2] https://sourceware.org/bugzilla/show_bug.cgi?id=24114 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar @ 2021-02-24 18:16 ` Paul Eggert 2021-02-24 19:18 ` Florian Weimer 2021-02-24 22:37 ` Joseph Myers 1 sibling, 1 reply; 14+ messages in thread From: Paul Eggert @ 2021-02-24 18:16 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: GNU C Library On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote: > It appears to offload the responsibility of sanitizing regular > expressions to the caller, which seems like the responsibility of the > regular expression library. We have not rejected CVEs on this basis in > the past[2] but it seems like something we should remedy as a policy. Since POSIX regular expression matching is NP-complete, wouldn't we need to prove P=NP in order to forestall ReDoS attacks? Seems a bit much to ask.... I'm not saying that we shouldn't fix buffer overflows, only that it's not feasible to fix CVE-related bugs in an area where both standards and widespread existing use force glibc (and other regular expression libraries) to be insecure. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 18:16 ` Paul Eggert @ 2021-02-24 19:18 ` Florian Weimer 2021-02-24 19:49 ` Paul Eggert 0 siblings, 1 reply; 14+ messages in thread From: Florian Weimer @ 2021-02-24 19:18 UTC (permalink / raw) To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library * Paul Eggert: > On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote: > >> It appears to offload the responsibility of sanitizing regular >> expressions to the caller, which seems like the responsibility of >> the regular expression library. We have not rejected CVEs on this >> basis in the past[2] but it seems like something we should remedy as >> a policy. > > Since POSIX regular expression matching is NP-complete, wouldn't we > need to prove P=NP in order to forestall ReDoS attacks? Seems a bit > much to ask.... POSIX Extended regexps are not NP-complete (if I understand these terms correctly), but our implementation still accepts backreferences as an extension, and changing that would likely need a new symbol version or flag, to preserve backwards compatibility. But the backreference problem is indeed the main reason why I thought that it wouldn't make sense to treat denial-of-service issues as vulnerabilities. Thanks, Florian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 19:18 ` Florian Weimer @ 2021-02-24 19:49 ` Paul Eggert 2021-02-24 19:56 ` Florian Weimer 0 siblings, 1 reply; 14+ messages in thread From: Paul Eggert @ 2021-02-24 19:49 UTC (permalink / raw) To: Florian Weimer; +Cc: Siddhesh Poyarekar, GNU C Library On 2/24/21 11:18 AM, Florian Weimer wrote: > POSIX Extended regexps are not NP-complete (if I understand these terms > correctly) Hmm, I thought they were. See, for example, Schmid's paper[1], which says "The matching problem of regex, i. e., deciding whether a given regex can match a given word, is NP-complete (even for strongly restricted variants)". Schmid defines "regex" to mean regular expressions with backreferences. But as you wrote, this theoretical point is swamped by the practical matter that glibc's matcher (and all other matchers that I know of) are exponential for this sort of thing. [1] Schmid ML. Regular Expressions with Backreferences: Polynomial-Time Matching Techniques. 2019-03-14. https://arxiv.org/abs/1903.05896 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 19:49 ` Paul Eggert @ 2021-02-24 19:56 ` Florian Weimer 0 siblings, 0 replies; 14+ messages in thread From: Florian Weimer @ 2021-02-24 19:56 UTC (permalink / raw) To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library * Paul Eggert: > On 2/24/21 11:18 AM, Florian Weimer wrote: >> POSIX Extended regexps are not NP-complete (if I understand these terms >> correctly) > > Hmm, I thought they were. See, for example, Schmid's paper[1], which > says "The matching problem of regex, i. e., deciding whether a given > regex can match a given word, is NP-complete (even for strongly > restricted variants)". Schmid defines "regex" to mean regular > expressions with backreferences. POSIX Basic regexps support backreferences, POSIX Extended regexps do not. So the terminology is kind of reversed there. Thanks, Florian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar 2021-02-24 18:16 ` Paul Eggert @ 2021-02-24 22:37 ` Joseph Myers 2021-02-25 2:51 ` Siddhesh Poyarekar 1 sibling, 1 reply; 14+ messages in thread From: Joseph Myers @ 2021-02-24 22:37 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer On Wed, 24 Feb 2021, Siddhesh Poyarekar wrote: > The security exceptions[1] wiki page mentions that buffer overflows and other > similar bugs will be treated as security vulnerabilities provided (well it > actually says 'assuming' but I interpreted it as 'provided') that the > offending pattern is 'trusted and reasonably well structured'. > > It appears to offload the responsibility of sanitizing regular expressions to > the caller, which seems like the responsibility of the regular expression > library. We have not rejected CVEs on this basis in the past[2] but it seems > like something we should remedy as a policy. It seems unreasonable to me to > expect users to parse regular expressions to validate them before passing them > on to the regular expression parser library. I think we should treat buffer overflows and similar (runtime undefined behavior in libc, including unbounded alloca / VLAs) as security bugs even for untrusted regular expressions. But unbounded time or memory usage (including the case of stack overflow with unbounded recursion as opposed to a single large stack allocation) should not be considered security bugs if they only arise from untrusted regular expressions and the code avoids undefined behavior (checks malloc return value, checks for overflows calculating allocation sizes, etc.). (If the code fails to check for malloc returning NULL and so has a null pointer dereference as a result of unbounded memory usage, that would be a low-priority security bug. If such an error, e.g. failing to avoid integer overflow calculating an allocation size, is more plausibly exploitable for code execution, it would be a more serious security bug.) -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-24 22:37 ` Joseph Myers @ 2021-02-25 2:51 ` Siddhesh Poyarekar 2021-02-25 3:03 ` Zack Weinberg 0 siblings, 1 reply; 14+ messages in thread From: Siddhesh Poyarekar @ 2021-02-25 2:51 UTC (permalink / raw) To: Joseph Myers; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer On 2/25/21 4:07 AM, Joseph Myers wrote: > I think we should treat buffer overflows and similar (runtime undefined > behavior in libc, including unbounded alloca / VLAs) as security bugs even > for untrusted regular expressions. But unbounded time or memory usage > (including the case of stack overflow with unbounded recursion as opposed > to a single large stack allocation) should not be considered security bugs > if they only arise from untrusted regular expressions and the code avoids > undefined behavior (checks malloc return value, checks for overflows > calculating allocation sizes, etc.). (If the code fails to check for > malloc returning NULL and so has a null pointer dereference as a result of > unbounded memory usage, that would be a low-priority security bug. If > such an error, e.g. failing to avoid integer overflow calculating an > allocation size, is more plausibly exploitable for code execution, it > would be a more serious security bug.) Yes, the resource usage exception encapsulates the concerns Paul and Florian have too. A different line in the regular expression security exception already mentions this: ==== Consequently, resource exhaustion issues which can be triggered only with crafted patterns (either during compilation or execution) are not treated as security bugs. (This does not mean we do not intend to fix such issues as regular bugs if possible.) ==== I'll change the following line in the security exception: ==== However, during execution, crashes, infinite loops, buffer overflows and reading past buffers (read-only buffer overruns), memory leaks and other, similar bugs should be treated as security vulnerabilities, assuming that the pattern is trusted and reasonably structured. ==== to read as: ==== However, crashes, infinite loops, buffer overflows and overreads, memory leaks and other bugs resulting from the regex implementation relying on undefined behavior should be treated as security vulnerabilities. ==== which I understand is the consensus on this thread. Does that look OK? Thanks, Siddhesh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 2:51 ` Siddhesh Poyarekar @ 2021-02-25 3:03 ` Zack Weinberg 2021-02-25 3:15 ` Siddhesh Poyarekar 0 siblings, 1 reply; 14+ messages in thread From: Zack Weinberg @ 2021-02-25 3:03 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote: > ==== > Consequently, resource exhaustion issues which can be triggered only > with crafted patterns (either during compilation or execution) are not > treated as security bugs. (This does not mean we do not intend to fix > such issues as regular bugs if possible.) > ==== > > I'll change the following line in the security exception: > > ==== > However, during execution, crashes, infinite loops, buffer overflows and > reading past buffers (read-only buffer overruns), memory leaks and > other, similar bugs should be treated as security vulnerabilities, > assuming that the pattern is trusted and reasonably structured. > ==== > > to read as: > > ==== > However, crashes, infinite loops, buffer overflows and overreads, memory > leaks and other bugs resulting from the regex implementation relying on > undefined behavior should be treated as security vulnerabilities. > ==== Probably stack-busting recursion should be specifically called out as something we *don't* promise to be able to fix. I'd also suggest saying that "merely" quadratic or exponential backtracking behavior from e.g. /(x+x+)+y/ is different than a genuinely infinite loop. zw ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 3:03 ` Zack Weinberg @ 2021-02-25 3:15 ` Siddhesh Poyarekar 2021-02-25 3:26 ` Zack Weinberg 2021-02-25 4:53 ` Paul Eggert 0 siblings, 2 replies; 14+ messages in thread From: Siddhesh Poyarekar @ 2021-02-25 3:15 UTC (permalink / raw) To: Zack Weinberg Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha On 2/25/21 8:33 AM, Zack Weinberg wrote: > On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote: >> ==== >> Consequently, resource exhaustion issues which can be triggered only >> with crafted patterns (either during compilation or execution) are not >> treated as security bugs. (This does not mean we do not intend to fix >> such issues as regular bugs if possible.) >> ==== >> >> I'll change the following line in the security exception: >> >> ==== >> However, during execution, crashes, infinite loops, buffer overflows and >> reading past buffers (read-only buffer overruns), memory leaks and >> other, similar bugs should be treated as security vulnerabilities, >> assuming that the pattern is trusted and reasonably structured. >> ==== >> >> to read as: >> >> ==== >> However, crashes, infinite loops, buffer overflows and overreads, memory >> leaks and other bugs resulting from the regex implementation relying on >> undefined behavior should be treated as security vulnerabilities. >> ==== > > Probably stack-busting recursion should be specifically called out as > something we *don't* promise to be able to fix. I'd also suggest > saying that "merely" quadratic or exponential backtracking behavior > from e.g. /(x+x+)+y/ is different than a genuinely infinite loop. How about this; it's the full replacement blurb now and not just the one paragraph: ==== Regular expression processing comes in two parts, compilation (through regcomp) and execution (through regexec). Implementing regular expressions efficiently, in a standard-conforming way, and without denial-of-service vulnerabilities is very difficult and impossible for Basic Regular Expressions. Most implementation strategies have issues dealing with certain classes of patterns. Consequently, certain issues which can be triggered only with crafted patterns (either during compilation or execution) are treated as regular bugs and not security issues. Examples of such issues would include (but is not limited to): * Running out of memory through valid use of malloc * Quadratic or exponential behaviour resulting in slow execution time * Stack overflows due to recursion when processing patterns Crashes, user controlled unbounded alloca, infinite loops (and not merely exponential behavior), buffer overflows and overreads, memory leaks and other bugs resulting from the regex implementation relying on undefined behavior should be treated as security vulnerabilities. ==== Siddhesh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 3:15 ` Siddhesh Poyarekar @ 2021-02-25 3:26 ` Zack Weinberg 2021-02-25 4:53 ` Paul Eggert 1 sibling, 0 replies; 14+ messages in thread From: Zack Weinberg @ 2021-02-25 3:26 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha On Wed, Feb 24, 2021 at 10:15 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote: > > How about this; it's the full replacement blurb now and not just the one > paragraph: > > ==== > Regular expression processing comes in two parts, compilation (through > regcomp) and execution (through regexec). > > Implementing regular expressions efficiently, in a standard-conforming > way, and without denial-of-service vulnerabilities is very difficult and > impossible for Basic Regular Expressions. Most implementation strategies > have issues dealing with certain classes of patterns. > > Consequently, certain issues which can be triggered only with crafted > patterns (either during compilation or execution) are treated as regular > bugs and not security issues. Examples of such issues would include > (but is not limited to): > > * Running out of memory through valid use of malloc > * Quadratic or exponential behaviour resulting in slow execution time > * Stack overflows due to recursion when processing patterns > > Crashes, user controlled unbounded alloca, infinite loops (and not > merely exponential behavior), buffer overflows and overreads, memory > leaks and other bugs resulting from the regex implementation relying on > undefined behavior should be treated as security vulnerabilities. > ==== Yeah, that looks good to me zw ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 3:15 ` Siddhesh Poyarekar 2021-02-25 3:26 ` Zack Weinberg @ 2021-02-25 4:53 ` Paul Eggert 2021-02-25 5:27 ` Siddhesh Poyarekar 1 sibling, 1 reply; 14+ messages in thread From: Paul Eggert @ 2021-02-25 4:53 UTC (permalink / raw) To: Siddhesh Poyarekar, Zack Weinberg Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote: > * Stack overflows due to recursion when processing patterns > > Crashes, user controlled unbounded alloca, I don't understand why user-controlled unbounded alloca is in a different category from user-controlled unbounded recursion. In both cases you exhaust the stack. So I suggest removing "user controlled unbounded alloca,". Other than that it looks good. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 4:53 ` Paul Eggert @ 2021-02-25 5:27 ` Siddhesh Poyarekar 2021-02-26 21:12 ` Paul Eggert 0 siblings, 1 reply; 14+ messages in thread From: Siddhesh Poyarekar @ 2021-02-25 5:27 UTC (permalink / raw) To: Paul Eggert, Zack Weinberg Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers On 2/25/21 10:23 AM, Paul Eggert wrote: > On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote: >> * Stack overflows due to recursion when processing patterns >> >> Crashes, user controlled unbounded alloca, > > I don't understand why user-controlled unbounded alloca is in a > different category from user-controlled unbounded recursion. In both > cases you exhaust the stack. So I suggest removing "user controlled > unbounded alloca,". I wanted to make the distinction between repeated stack allocations vs a single unbounded allocation due to user input, wherein the latter ought to be forbidden in all of glibc. Would you agree with that or do you think we should stay away from allocation related failures altogether? Siddhesh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-25 5:27 ` Siddhesh Poyarekar @ 2021-02-26 21:12 ` Paul Eggert 2021-03-02 10:58 ` Siddhesh Poyarekar 0 siblings, 1 reply; 14+ messages in thread From: Paul Eggert @ 2021-02-26 21:12 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers, Zack Weinberg On 2/24/21 9:27 PM, Siddhesh Poyarekar wrote: > > I wanted to make the distinction between repeated stack allocations vs a > single unbounded allocation due to user input, wherein the latter ought > to be forbidden in all of glibc. Would you agree with that or do you > think we should stay away from allocation related failures altogether? For this particular case (regexp allocation where exponential behavior is inevitable) I think we should stay away from allocation-related failures entirely, as we're getting into the technical weeds here. Untrusted regexps can cause resource problems in lots of different ways: they can consume CPU time within a thread, they can bottleneck other threads from running, they can consume heap, they can consume stack, and I've probably forgotten some. Trying to distinguish among these problems in a policy document merely obscures the big picture which is pretty simple: don't use untrusted regexps. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions 2021-02-26 21:12 ` Paul Eggert @ 2021-03-02 10:58 ` Siddhesh Poyarekar 0 siblings, 0 replies; 14+ messages in thread From: Siddhesh Poyarekar @ 2021-03-02 10:58 UTC (permalink / raw) To: Paul Eggert Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers, Zack Weinberg On 2/27/21 2:42 AM, Paul Eggert wrote: > For this particular case (regexp allocation where exponential behavior > is inevitable) I think we should stay away from allocation-related > failures entirely, as we're getting into the technical weeds here. > > Untrusted regexps can cause resource problems in lots of different ways: > they can consume CPU time within a thread, they can bottleneck other > threads from running, they can consume heap, they can consume stack, and > I've probably forgotten some. Trying to distinguish among these problems > in a policy document merely obscures the big picture which is pretty > simple: don't use untrusted regexps. I've dropped 'unbounded alloca' and published the rest: https://sourceware.org/glibc/wiki/Security%20Exceptions Thanks, Siddhesh ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-03-02 10:58 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar 2021-02-24 18:16 ` Paul Eggert 2021-02-24 19:18 ` Florian Weimer 2021-02-24 19:49 ` Paul Eggert 2021-02-24 19:56 ` Florian Weimer 2021-02-24 22:37 ` Joseph Myers 2021-02-25 2:51 ` Siddhesh Poyarekar 2021-02-25 3:03 ` Zack Weinberg 2021-02-25 3:15 ` Siddhesh Poyarekar 2021-02-25 3:26 ` Zack Weinberg 2021-02-25 4:53 ` Paul Eggert 2021-02-25 5:27 ` Siddhesh Poyarekar 2021-02-26 21:12 ` Paul Eggert 2021-03-02 10:58 ` Siddhesh Poyarekar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).