* 'trusted and reasonably structured' regular expressions
@ 2021-02-24 4:53 Siddhesh Poyarekar
2021-02-24 18:16 ` Paul Eggert
2021-02-24 22:37 ` Joseph Myers
0 siblings, 2 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-24 4:53 UTC (permalink / raw)
To: Siddhesh Poyarekar via Libc-alpha; +Cc: Florian Weimer, Carlos O'Donell
Hi,
The security exceptions[1] wiki page mentions that buffer overflows and
other similar bugs will be treated as security vulnerabilities provided
(well it actually says 'assuming' but I interpreted it as 'provided')
that the offending pattern is 'trusted and reasonably well structured'.
It appears to offload the responsibility of sanitizing regular
expressions to the caller, which seems like the responsibility of the
regular expression library. We have not rejected CVEs on this basis in
the past[2] but it seems like something we should remedy as a policy.
It seems unreasonable to me to expect users to parse regular expressions
to validate them before passing them on to the regular expression parser
library.
Thoughts?
Siddhesh
[1] https://sourceware.org/glibc/wiki/Security%20Exceptions
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=24114
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
@ 2021-02-24 18:16 ` Paul Eggert
2021-02-24 19:18 ` Florian Weimer
2021-02-24 22:37 ` Joseph Myers
1 sibling, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-24 18:16 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: GNU C Library
On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote:
> It appears to offload the responsibility of sanitizing regular
> expressions to the caller, which seems like the responsibility of the
> regular expression library. We have not rejected CVEs on this basis in
> the past[2] but it seems like something we should remedy as a policy.
Since POSIX regular expression matching is NP-complete, wouldn't we need
to prove P=NP in order to forestall ReDoS attacks? Seems a bit much to
ask....
I'm not saying that we shouldn't fix buffer overflows, only that it's
not feasible to fix CVE-related bugs in an area where both standards and
widespread existing use force glibc (and other regular expression
libraries) to be insecure.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 18:16 ` Paul Eggert
@ 2021-02-24 19:18 ` Florian Weimer
2021-02-24 19:49 ` Paul Eggert
0 siblings, 1 reply; 14+ messages in thread
From: Florian Weimer @ 2021-02-24 19:18 UTC (permalink / raw)
To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library
* Paul Eggert:
> On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote:
>
>> It appears to offload the responsibility of sanitizing regular
>> expressions to the caller, which seems like the responsibility of
>> the regular expression library. We have not rejected CVEs on this
>> basis in the past[2] but it seems like something we should remedy as
>> a policy.
>
> Since POSIX regular expression matching is NP-complete, wouldn't we
> need to prove P=NP in order to forestall ReDoS attacks? Seems a bit
> much to ask....
POSIX Extended regexps are not NP-complete (if I understand these terms
correctly), but our implementation still accepts backreferences as an
extension, and changing that would likely need a new symbol version or
flag, to preserve backwards compatibility.
But the backreference problem is indeed the main reason why I thought
that it wouldn't make sense to treat denial-of-service issues as
vulnerabilities.
Thanks,
Florian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 19:18 ` Florian Weimer
@ 2021-02-24 19:49 ` Paul Eggert
2021-02-24 19:56 ` Florian Weimer
0 siblings, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-24 19:49 UTC (permalink / raw)
To: Florian Weimer; +Cc: Siddhesh Poyarekar, GNU C Library
On 2/24/21 11:18 AM, Florian Weimer wrote:
> POSIX Extended regexps are not NP-complete (if I understand these terms
> correctly)
Hmm, I thought they were. See, for example, Schmid's paper[1], which
says "The matching problem of regex, i. e., deciding whether a given
regex can match a given word, is NP-complete (even for strongly
restricted variants)". Schmid defines "regex" to mean regular
expressions with backreferences.
But as you wrote, this theoretical point is swamped by the practical
matter that glibc's matcher (and all other matchers that I know of) are
exponential for this sort of thing.
[1] Schmid ML. Regular Expressions with Backreferences: Polynomial-Time
Matching Techniques. 2019-03-14. https://arxiv.org/abs/1903.05896
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 19:49 ` Paul Eggert
@ 2021-02-24 19:56 ` Florian Weimer
0 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2021-02-24 19:56 UTC (permalink / raw)
To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library
* Paul Eggert:
> On 2/24/21 11:18 AM, Florian Weimer wrote:
>> POSIX Extended regexps are not NP-complete (if I understand these terms
>> correctly)
>
> Hmm, I thought they were. See, for example, Schmid's paper[1], which
> says "The matching problem of regex, i. e., deciding whether a given
> regex can match a given word, is NP-complete (even for strongly
> restricted variants)". Schmid defines "regex" to mean regular
> expressions with backreferences.
POSIX Basic regexps support backreferences, POSIX Extended regexps do
not. So the terminology is kind of reversed there.
Thanks,
Florian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
2021-02-24 18:16 ` Paul Eggert
@ 2021-02-24 22:37 ` Joseph Myers
2021-02-25 2:51 ` Siddhesh Poyarekar
1 sibling, 1 reply; 14+ messages in thread
From: Joseph Myers @ 2021-02-24 22:37 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer
On Wed, 24 Feb 2021, Siddhesh Poyarekar wrote:
> The security exceptions[1] wiki page mentions that buffer overflows and other
> similar bugs will be treated as security vulnerabilities provided (well it
> actually says 'assuming' but I interpreted it as 'provided') that the
> offending pattern is 'trusted and reasonably well structured'.
>
> It appears to offload the responsibility of sanitizing regular expressions to
> the caller, which seems like the responsibility of the regular expression
> library. We have not rejected CVEs on this basis in the past[2] but it seems
> like something we should remedy as a policy. It seems unreasonable to me to
> expect users to parse regular expressions to validate them before passing them
> on to the regular expression parser library.
I think we should treat buffer overflows and similar (runtime undefined
behavior in libc, including unbounded alloca / VLAs) as security bugs even
for untrusted regular expressions. But unbounded time or memory usage
(including the case of stack overflow with unbounded recursion as opposed
to a single large stack allocation) should not be considered security bugs
if they only arise from untrusted regular expressions and the code avoids
undefined behavior (checks malloc return value, checks for overflows
calculating allocation sizes, etc.). (If the code fails to check for
malloc returning NULL and so has a null pointer dereference as a result of
unbounded memory usage, that would be a low-priority security bug. If
such an error, e.g. failing to avoid integer overflow calculating an
allocation size, is more plausibly exploitable for code execution, it
would be a more serious security bug.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-24 22:37 ` Joseph Myers
@ 2021-02-25 2:51 ` Siddhesh Poyarekar
2021-02-25 3:03 ` Zack Weinberg
0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25 2:51 UTC (permalink / raw)
To: Joseph Myers; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer
On 2/25/21 4:07 AM, Joseph Myers wrote:
> I think we should treat buffer overflows and similar (runtime undefined
> behavior in libc, including unbounded alloca / VLAs) as security bugs even
> for untrusted regular expressions. But unbounded time or memory usage
> (including the case of stack overflow with unbounded recursion as opposed
> to a single large stack allocation) should not be considered security bugs
> if they only arise from untrusted regular expressions and the code avoids
> undefined behavior (checks malloc return value, checks for overflows
> calculating allocation sizes, etc.). (If the code fails to check for
> malloc returning NULL and so has a null pointer dereference as a result of
> unbounded memory usage, that would be a low-priority security bug. If
> such an error, e.g. failing to avoid integer overflow calculating an
> allocation size, is more plausibly exploitable for code execution, it
> would be a more serious security bug.)
Yes, the resource usage exception encapsulates the concerns Paul and
Florian have too. A different line in the regular expression security
exception already mentions this:
====
Consequently, resource exhaustion issues which can be triggered only
with crafted patterns (either during compilation or execution) are not
treated as security bugs. (This does not mean we do not intend to fix
such issues as regular bugs if possible.)
====
I'll change the following line in the security exception:
====
However, during execution, crashes, infinite loops, buffer overflows and
reading past buffers (read-only buffer overruns), memory leaks and
other, similar bugs should be treated as security vulnerabilities,
assuming that the pattern is trusted and reasonably structured.
====
to read as:
====
However, crashes, infinite loops, buffer overflows and overreads, memory
leaks and other bugs resulting from the regex implementation relying on
undefined behavior should be treated as security vulnerabilities.
====
which I understand is the consensus on this thread. Does that look OK?
Thanks,
Siddhesh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 2:51 ` Siddhesh Poyarekar
@ 2021-02-25 3:03 ` Zack Weinberg
2021-02-25 3:15 ` Siddhesh Poyarekar
0 siblings, 1 reply; 14+ messages in thread
From: Zack Weinberg @ 2021-02-25 3:03 UTC (permalink / raw)
To: Siddhesh Poyarekar
Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha
On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> ====
> Consequently, resource exhaustion issues which can be triggered only
> with crafted patterns (either during compilation or execution) are not
> treated as security bugs. (This does not mean we do not intend to fix
> such issues as regular bugs if possible.)
> ====
>
> I'll change the following line in the security exception:
>
> ====
> However, during execution, crashes, infinite loops, buffer overflows and
> reading past buffers (read-only buffer overruns), memory leaks and
> other, similar bugs should be treated as security vulnerabilities,
> assuming that the pattern is trusted and reasonably structured.
> ====
>
> to read as:
>
> ====
> However, crashes, infinite loops, buffer overflows and overreads, memory
> leaks and other bugs resulting from the regex implementation relying on
> undefined behavior should be treated as security vulnerabilities.
> ====
Probably stack-busting recursion should be specifically called out as
something we *don't* promise to be able to fix. I'd also suggest
saying that "merely" quadratic or exponential backtracking behavior
from e.g. /(x+x+)+y/ is different than a genuinely infinite loop.
zw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 3:03 ` Zack Weinberg
@ 2021-02-25 3:15 ` Siddhesh Poyarekar
2021-02-25 3:26 ` Zack Weinberg
2021-02-25 4:53 ` Paul Eggert
0 siblings, 2 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25 3:15 UTC (permalink / raw)
To: Zack Weinberg
Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha
On 2/25/21 8:33 AM, Zack Weinberg wrote:
> On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>> ====
>> Consequently, resource exhaustion issues which can be triggered only
>> with crafted patterns (either during compilation or execution) are not
>> treated as security bugs. (This does not mean we do not intend to fix
>> such issues as regular bugs if possible.)
>> ====
>>
>> I'll change the following line in the security exception:
>>
>> ====
>> However, during execution, crashes, infinite loops, buffer overflows and
>> reading past buffers (read-only buffer overruns), memory leaks and
>> other, similar bugs should be treated as security vulnerabilities,
>> assuming that the pattern is trusted and reasonably structured.
>> ====
>>
>> to read as:
>>
>> ====
>> However, crashes, infinite loops, buffer overflows and overreads, memory
>> leaks and other bugs resulting from the regex implementation relying on
>> undefined behavior should be treated as security vulnerabilities.
>> ====
>
> Probably stack-busting recursion should be specifically called out as
> something we *don't* promise to be able to fix. I'd also suggest
> saying that "merely" quadratic or exponential backtracking behavior
> from e.g. /(x+x+)+y/ is different than a genuinely infinite loop.
How about this; it's the full replacement blurb now and not just the one
paragraph:
====
Regular expression processing comes in two parts, compilation (through
regcomp) and execution (through regexec).
Implementing regular expressions efficiently, in a standard-conforming
way, and without denial-of-service vulnerabilities is very difficult and
impossible for Basic Regular Expressions. Most implementation strategies
have issues dealing with certain classes of patterns.
Consequently, certain issues which can be triggered only with crafted
patterns (either during compilation or execution) are treated as regular
bugs and not security issues. Examples of such issues would include
(but is not limited to):
* Running out of memory through valid use of malloc
* Quadratic or exponential behaviour resulting in slow execution time
* Stack overflows due to recursion when processing patterns
Crashes, user controlled unbounded alloca, infinite loops (and not
merely exponential behavior), buffer overflows and overreads, memory
leaks and other bugs resulting from the regex implementation relying on
undefined behavior should be treated as security vulnerabilities.
====
Siddhesh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 3:15 ` Siddhesh Poyarekar
@ 2021-02-25 3:26 ` Zack Weinberg
2021-02-25 4:53 ` Paul Eggert
1 sibling, 0 replies; 14+ messages in thread
From: Zack Weinberg @ 2021-02-25 3:26 UTC (permalink / raw)
To: Siddhesh Poyarekar
Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha
On Wed, Feb 24, 2021 at 10:15 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> How about this; it's the full replacement blurb now and not just the one
> paragraph:
>
> ====
> Regular expression processing comes in two parts, compilation (through
> regcomp) and execution (through regexec).
>
> Implementing regular expressions efficiently, in a standard-conforming
> way, and without denial-of-service vulnerabilities is very difficult and
> impossible for Basic Regular Expressions. Most implementation strategies
> have issues dealing with certain classes of patterns.
>
> Consequently, certain issues which can be triggered only with crafted
> patterns (either during compilation or execution) are treated as regular
> bugs and not security issues. Examples of such issues would include
> (but is not limited to):
>
> * Running out of memory through valid use of malloc
> * Quadratic or exponential behaviour resulting in slow execution time
> * Stack overflows due to recursion when processing patterns
>
> Crashes, user controlled unbounded alloca, infinite loops (and not
> merely exponential behavior), buffer overflows and overreads, memory
> leaks and other bugs resulting from the regex implementation relying on
> undefined behavior should be treated as security vulnerabilities.
> ====
Yeah, that looks good to me
zw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 3:15 ` Siddhesh Poyarekar
2021-02-25 3:26 ` Zack Weinberg
@ 2021-02-25 4:53 ` Paul Eggert
2021-02-25 5:27 ` Siddhesh Poyarekar
1 sibling, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-25 4:53 UTC (permalink / raw)
To: Siddhesh Poyarekar, Zack Weinberg
Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers
On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote:
> * Stack overflows due to recursion when processing patterns
>
> Crashes, user controlled unbounded alloca,
I don't understand why user-controlled unbounded alloca is in a
different category from user-controlled unbounded recursion. In both
cases you exhaust the stack. So I suggest removing "user controlled
unbounded alloca,".
Other than that it looks good.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 4:53 ` Paul Eggert
@ 2021-02-25 5:27 ` Siddhesh Poyarekar
2021-02-26 21:12 ` Paul Eggert
0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25 5:27 UTC (permalink / raw)
To: Paul Eggert, Zack Weinberg
Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers
On 2/25/21 10:23 AM, Paul Eggert wrote:
> On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote:
>> * Stack overflows due to recursion when processing patterns
>>
>> Crashes, user controlled unbounded alloca,
>
> I don't understand why user-controlled unbounded alloca is in a
> different category from user-controlled unbounded recursion. In both
> cases you exhaust the stack. So I suggest removing "user controlled
> unbounded alloca,".
I wanted to make the distinction between repeated stack allocations vs a
single unbounded allocation due to user input, wherein the latter ought
to be forbidden in all of glibc. Would you agree with that or do you
think we should stay away from allocation related failures altogether?
Siddhesh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-25 5:27 ` Siddhesh Poyarekar
@ 2021-02-26 21:12 ` Paul Eggert
2021-03-02 10:58 ` Siddhesh Poyarekar
0 siblings, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-26 21:12 UTC (permalink / raw)
To: Siddhesh Poyarekar
Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers,
Zack Weinberg
On 2/24/21 9:27 PM, Siddhesh Poyarekar wrote:
>
> I wanted to make the distinction between repeated stack allocations vs a
> single unbounded allocation due to user input, wherein the latter ought
> to be forbidden in all of glibc. Would you agree with that or do you
> think we should stay away from allocation related failures altogether?
For this particular case (regexp allocation where exponential behavior
is inevitable) I think we should stay away from allocation-related
failures entirely, as we're getting into the technical weeds here.
Untrusted regexps can cause resource problems in lots of different ways:
they can consume CPU time within a thread, they can bottleneck other
threads from running, they can consume heap, they can consume stack, and
I've probably forgotten some. Trying to distinguish among these problems
in a policy document merely obscures the big picture which is pretty
simple: don't use untrusted regexps.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 'trusted and reasonably structured' regular expressions
2021-02-26 21:12 ` Paul Eggert
@ 2021-03-02 10:58 ` Siddhesh Poyarekar
0 siblings, 0 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-03-02 10:58 UTC (permalink / raw)
To: Paul Eggert
Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers,
Zack Weinberg
On 2/27/21 2:42 AM, Paul Eggert wrote:
> For this particular case (regexp allocation where exponential behavior
> is inevitable) I think we should stay away from allocation-related
> failures entirely, as we're getting into the technical weeds here.
>
> Untrusted regexps can cause resource problems in lots of different ways:
> they can consume CPU time within a thread, they can bottleneck other
> threads from running, they can consume heap, they can consume stack, and
> I've probably forgotten some. Trying to distinguish among these problems
> in a policy document merely obscures the big picture which is pretty
> simple: don't use untrusted regexps.
I've dropped 'unbounded alloca' and published the rest:
https://sourceware.org/glibc/wiki/Security%20Exceptions
Thanks,
Siddhesh
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-03-02 10:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
2021-02-24 18:16 ` Paul Eggert
2021-02-24 19:18 ` Florian Weimer
2021-02-24 19:49 ` Paul Eggert
2021-02-24 19:56 ` Florian Weimer
2021-02-24 22:37 ` Joseph Myers
2021-02-25 2:51 ` Siddhesh Poyarekar
2021-02-25 3:03 ` Zack Weinberg
2021-02-25 3:15 ` Siddhesh Poyarekar
2021-02-25 3:26 ` Zack Weinberg
2021-02-25 4:53 ` Paul Eggert
2021-02-25 5:27 ` Siddhesh Poyarekar
2021-02-26 21:12 ` Paul Eggert
2021-03-02 10:58 ` Siddhesh Poyarekar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).