public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* 'trusted and reasonably structured' regular expressions
@ 2021-02-24  4:53 Siddhesh Poyarekar
  2021-02-24 18:16 ` Paul Eggert
  2021-02-24 22:37 ` Joseph Myers
  0 siblings, 2 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-24  4:53 UTC (permalink / raw)
  To: Siddhesh Poyarekar via Libc-alpha; +Cc: Florian Weimer, Carlos O'Donell

Hi,

The security exceptions[1] wiki page mentions that buffer overflows and 
other similar bugs will be treated as security vulnerabilities provided 
(well it actually says 'assuming' but I interpreted it as 'provided') 
that the offending pattern is 'trusted and reasonably well structured'.

It appears to offload the responsibility of sanitizing regular 
expressions to the caller, which seems like the responsibility of the 
regular expression library.  We have not rejected CVEs on this basis in 
the past[2] but it seems like something we should remedy as a policy. 
It seems unreasonable to me to expect users to parse regular expressions 
to validate them before passing them on to the regular expression parser 
library.

Thoughts?

Siddhesh

[1] https://sourceware.org/glibc/wiki/Security%20Exceptions
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=24114

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24  4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
@ 2021-02-24 18:16 ` Paul Eggert
  2021-02-24 19:18   ` Florian Weimer
  2021-02-24 22:37 ` Joseph Myers
  1 sibling, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-24 18:16 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: GNU C Library

On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote:

> It appears to offload the responsibility of sanitizing regular 
> expressions to the caller, which seems like the responsibility of the 
> regular expression library.  We have not rejected CVEs on this basis in 
> the past[2] but it seems like something we should remedy as a policy.

Since POSIX regular expression matching is NP-complete, wouldn't we need 
to prove P=NP in order to forestall ReDoS attacks? Seems a bit much to 
ask....

I'm not saying that we shouldn't fix buffer overflows, only that it's 
not feasible to fix CVE-related bugs in an area where both standards and 
widespread existing use force glibc (and other regular expression 
libraries) to be insecure.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24 18:16 ` Paul Eggert
@ 2021-02-24 19:18   ` Florian Weimer
  2021-02-24 19:49     ` Paul Eggert
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Weimer @ 2021-02-24 19:18 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library

* Paul Eggert:

> On 2/23/21 8:53 PM, Siddhesh Poyarekar wrote:
>
>> It appears to offload the responsibility of sanitizing regular
>> expressions to the caller, which seems like the responsibility of
>> the regular expression library.  We have not rejected CVEs on this
>> basis in the past[2] but it seems like something we should remedy as
>> a policy.
>
> Since POSIX regular expression matching is NP-complete, wouldn't we
> need to prove P=NP in order to forestall ReDoS attacks? Seems a bit
> much to ask....

POSIX Extended regexps are not NP-complete (if I understand these terms
correctly), but our implementation still accepts backreferences as an
extension, and changing that would likely need a new symbol version or
flag, to preserve backwards compatibility.

But the backreference problem is indeed the main reason why I thought
that it wouldn't make sense to treat denial-of-service issues as
vulnerabilities.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24 19:18   ` Florian Weimer
@ 2021-02-24 19:49     ` Paul Eggert
  2021-02-24 19:56       ` Florian Weimer
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-24 19:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Siddhesh Poyarekar, GNU C Library

On 2/24/21 11:18 AM, Florian Weimer wrote:
> POSIX Extended regexps are not NP-complete (if I understand these terms
> correctly)

Hmm, I thought they were. See, for example, Schmid's paper[1], which 
says "The matching problem of regex, i. e., deciding whether a given 
regex can match a given word, is NP-complete (even for strongly 
restricted variants)". Schmid defines "regex" to mean regular 
expressions with backreferences.

But as you wrote, this theoretical point is swamped by the practical 
matter that glibc's matcher (and all other matchers that I know of) are 
exponential for this sort of thing.

[1] Schmid ML. Regular Expressions with Backreferences: Polynomial-Time 
Matching Techniques. 2019-03-14. https://arxiv.org/abs/1903.05896

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24 19:49     ` Paul Eggert
@ 2021-02-24 19:56       ` Florian Weimer
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2021-02-24 19:56 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Siddhesh Poyarekar, GNU C Library

* Paul Eggert:

> On 2/24/21 11:18 AM, Florian Weimer wrote:
>> POSIX Extended regexps are not NP-complete (if I understand these terms
>> correctly)
>
> Hmm, I thought they were. See, for example, Schmid's paper[1], which
> says "The matching problem of regex, i. e., deciding whether a given 
> regex can match a given word, is NP-complete (even for strongly
> restricted variants)". Schmid defines "regex" to mean regular 
> expressions with backreferences.

POSIX Basic regexps support backreferences, POSIX Extended regexps do
not.  So the terminology is kind of reversed there.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24  4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
  2021-02-24 18:16 ` Paul Eggert
@ 2021-02-24 22:37 ` Joseph Myers
  2021-02-25  2:51   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 14+ messages in thread
From: Joseph Myers @ 2021-02-24 22:37 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer

On Wed, 24 Feb 2021, Siddhesh Poyarekar wrote:

> The security exceptions[1] wiki page mentions that buffer overflows and other
> similar bugs will be treated as security vulnerabilities provided (well it
> actually says 'assuming' but I interpreted it as 'provided') that the
> offending pattern is 'trusted and reasonably well structured'.
> 
> It appears to offload the responsibility of sanitizing regular expressions to
> the caller, which seems like the responsibility of the regular expression
> library.  We have not rejected CVEs on this basis in the past[2] but it seems
> like something we should remedy as a policy. It seems unreasonable to me to
> expect users to parse regular expressions to validate them before passing them
> on to the regular expression parser library.

I think we should treat buffer overflows and similar (runtime undefined 
behavior in libc, including unbounded alloca / VLAs) as security bugs even 
for untrusted regular expressions.  But unbounded time or memory usage 
(including the case of stack overflow with unbounded recursion as opposed 
to a single large stack allocation) should not be considered security bugs 
if they only arise from untrusted regular expressions and the code avoids 
undefined behavior (checks malloc return value, checks for overflows 
calculating allocation sizes, etc.).  (If the code fails to check for 
malloc returning NULL and so has a null pointer dereference as a result of 
unbounded memory usage, that would be a low-priority security bug.  If 
such an error, e.g. failing to avoid integer overflow calculating an 
allocation size, is more plausibly exploitable for code execution, it 
would be a more serious security bug.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-24 22:37 ` Joseph Myers
@ 2021-02-25  2:51   ` Siddhesh Poyarekar
  2021-02-25  3:03     ` Zack Weinberg
  0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25  2:51 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Siddhesh Poyarekar via Libc-alpha, Florian Weimer

On 2/25/21 4:07 AM, Joseph Myers wrote:
> I think we should treat buffer overflows and similar (runtime undefined
> behavior in libc, including unbounded alloca / VLAs) as security bugs even
> for untrusted regular expressions.  But unbounded time or memory usage
> (including the case of stack overflow with unbounded recursion as opposed
> to a single large stack allocation) should not be considered security bugs
> if they only arise from untrusted regular expressions and the code avoids
> undefined behavior (checks malloc return value, checks for overflows
> calculating allocation sizes, etc.).  (If the code fails to check for
> malloc returning NULL and so has a null pointer dereference as a result of
> unbounded memory usage, that would be a low-priority security bug.  If
> such an error, e.g. failing to avoid integer overflow calculating an
> allocation size, is more plausibly exploitable for code execution, it
> would be a more serious security bug.)

Yes, the resource usage exception encapsulates the concerns Paul and 
Florian have too.  A different line in the regular expression security 
exception already mentions this:

====
Consequently, resource exhaustion issues which can be triggered only 
with crafted patterns (either during compilation or execution) are not 
treated as security bugs. (This does not mean we do not intend to fix 
such issues as regular bugs if possible.)
====

I'll change the following line in the security exception:

====
However, during execution, crashes, infinite loops, buffer overflows and 
reading past buffers (read-only buffer overruns), memory leaks and 
other, similar bugs should be treated as security vulnerabilities, 
assuming that the pattern is trusted and reasonably structured.
====

to read as:

====
However, crashes, infinite loops, buffer overflows and overreads, memory 
leaks and other bugs resulting from the regex implementation relying on 
undefined behavior should be treated as security vulnerabilities.
====

which I understand is the consensus on this thread.  Does that look OK?

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  2:51   ` Siddhesh Poyarekar
@ 2021-02-25  3:03     ` Zack Weinberg
  2021-02-25  3:15       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Weinberg @ 2021-02-25  3:03 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha

On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> ====
> Consequently, resource exhaustion issues which can be triggered only
> with crafted patterns (either during compilation or execution) are not
> treated as security bugs. (This does not mean we do not intend to fix
> such issues as regular bugs if possible.)
> ====
>
> I'll change the following line in the security exception:
>
> ====
> However, during execution, crashes, infinite loops, buffer overflows and
> reading past buffers (read-only buffer overruns), memory leaks and
> other, similar bugs should be treated as security vulnerabilities,
> assuming that the pattern is trusted and reasonably structured.
> ====
>
> to read as:
>
> ====
> However, crashes, infinite loops, buffer overflows and overreads, memory
> leaks and other bugs resulting from the regex implementation relying on
> undefined behavior should be treated as security vulnerabilities.
> ====

Probably stack-busting recursion should be specifically called out as
something we *don't* promise to be able to fix.  I'd also suggest
saying that "merely" quadratic or exponential backtracking behavior
from e.g. /(x+x+)+y/ is different than a genuinely infinite loop.

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  3:03     ` Zack Weinberg
@ 2021-02-25  3:15       ` Siddhesh Poyarekar
  2021-02-25  3:26         ` Zack Weinberg
  2021-02-25  4:53         ` Paul Eggert
  0 siblings, 2 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25  3:15 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha

On 2/25/21 8:33 AM, Zack Weinberg wrote:
> On Wed, Feb 24, 2021 at 9:51 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>> ====
>> Consequently, resource exhaustion issues which can be triggered only
>> with crafted patterns (either during compilation or execution) are not
>> treated as security bugs. (This does not mean we do not intend to fix
>> such issues as regular bugs if possible.)
>> ====
>>
>> I'll change the following line in the security exception:
>>
>> ====
>> However, during execution, crashes, infinite loops, buffer overflows and
>> reading past buffers (read-only buffer overruns), memory leaks and
>> other, similar bugs should be treated as security vulnerabilities,
>> assuming that the pattern is trusted and reasonably structured.
>> ====
>>
>> to read as:
>>
>> ====
>> However, crashes, infinite loops, buffer overflows and overreads, memory
>> leaks and other bugs resulting from the regex implementation relying on
>> undefined behavior should be treated as security vulnerabilities.
>> ====
> 
> Probably stack-busting recursion should be specifically called out as
> something we *don't* promise to be able to fix.  I'd also suggest
> saying that "merely" quadratic or exponential backtracking behavior
> from e.g. /(x+x+)+y/ is different than a genuinely infinite loop.

How about this; it's the full replacement blurb now and not just the one 
paragraph:

====
Regular expression processing comes in two parts, compilation (through 
regcomp) and execution (through regexec).

Implementing regular expressions efficiently, in a standard-conforming 
way, and without denial-of-service vulnerabilities is very difficult and 
impossible for Basic Regular Expressions. Most implementation strategies 
have issues dealing with certain classes of patterns.

Consequently, certain issues which can be triggered only with crafted 
patterns (either during compilation or execution) are treated as regular 
bugs and not security issues.  Examples of such issues would include 
(but is not limited to):

* Running out of memory through valid use of malloc
* Quadratic or exponential behaviour resulting in slow execution time
* Stack overflows due to recursion when processing patterns

Crashes, user controlled unbounded alloca, infinite loops (and not 
merely exponential behavior), buffer overflows and overreads, memory 
leaks and other bugs resulting from the regex implementation relying on 
undefined behavior should be treated as security vulnerabilities.
====

Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  3:15       ` Siddhesh Poyarekar
@ 2021-02-25  3:26         ` Zack Weinberg
  2021-02-25  4:53         ` Paul Eggert
  1 sibling, 0 replies; 14+ messages in thread
From: Zack Weinberg @ 2021-02-25  3:26 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Joseph Myers, Florian Weimer, Siddhesh Poyarekar via Libc-alpha

On Wed, Feb 24, 2021 at 10:15 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> How about this; it's the full replacement blurb now and not just the one
> paragraph:
>
> ====
> Regular expression processing comes in two parts, compilation (through
> regcomp) and execution (through regexec).
>
> Implementing regular expressions efficiently, in a standard-conforming
> way, and without denial-of-service vulnerabilities is very difficult and
> impossible for Basic Regular Expressions. Most implementation strategies
> have issues dealing with certain classes of patterns.
>
> Consequently, certain issues which can be triggered only with crafted
> patterns (either during compilation or execution) are treated as regular
> bugs and not security issues.  Examples of such issues would include
> (but is not limited to):
>
> * Running out of memory through valid use of malloc
> * Quadratic or exponential behaviour resulting in slow execution time
> * Stack overflows due to recursion when processing patterns
>
> Crashes, user controlled unbounded alloca, infinite loops (and not
> merely exponential behavior), buffer overflows and overreads, memory
> leaks and other bugs resulting from the regex implementation relying on
> undefined behavior should be treated as security vulnerabilities.
> ====

Yeah, that looks good to me

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  3:15       ` Siddhesh Poyarekar
  2021-02-25  3:26         ` Zack Weinberg
@ 2021-02-25  4:53         ` Paul Eggert
  2021-02-25  5:27           ` Siddhesh Poyarekar
  1 sibling, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-25  4:53 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Zack Weinberg
  Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers

On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote:
> * Stack overflows due to recursion when processing patterns
> 
> Crashes, user controlled unbounded alloca,

I don't understand why user-controlled unbounded alloca is in a 
different category from user-controlled unbounded recursion. In both 
cases you exhaust the stack. So I suggest removing "user controlled 
unbounded alloca,".

Other than that it looks good.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  4:53         ` Paul Eggert
@ 2021-02-25  5:27           ` Siddhesh Poyarekar
  2021-02-26 21:12             ` Paul Eggert
  0 siblings, 1 reply; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-02-25  5:27 UTC (permalink / raw)
  To: Paul Eggert, Zack Weinberg
  Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers

On 2/25/21 10:23 AM, Paul Eggert wrote:
> On 2/24/21 7:15 PM, Siddhesh Poyarekar wrote:
>> * Stack overflows due to recursion when processing patterns
>>
>> Crashes, user controlled unbounded alloca,
> 
> I don't understand why user-controlled unbounded alloca is in a 
> different category from user-controlled unbounded recursion. In both 
> cases you exhaust the stack. So I suggest removing "user controlled 
> unbounded alloca,".

I wanted to make the distinction between repeated stack allocations vs a 
single unbounded allocation due to user input, wherein the latter ought 
to be forbidden in all of glibc.  Would you agree with that or do you 
think we should stay away from allocation related failures altogether?

Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-25  5:27           ` Siddhesh Poyarekar
@ 2021-02-26 21:12             ` Paul Eggert
  2021-03-02 10:58               ` Siddhesh Poyarekar
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2021-02-26 21:12 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers,
	Zack Weinberg

On 2/24/21 9:27 PM, Siddhesh Poyarekar wrote:
> 
> I wanted to make the distinction between repeated stack allocations vs a 
> single unbounded allocation due to user input, wherein the latter ought 
> to be forbidden in all of glibc.  Would you agree with that or do you 
> think we should stay away from allocation related failures altogether?

For this particular case (regexp allocation where exponential behavior 
is inevitable) I think we should stay away from allocation-related 
failures entirely, as we're getting into the technical weeds here.

Untrusted regexps can cause resource problems in lots of different ways: 
they can consume CPU time within a thread, they can bottleneck other 
threads from running, they can consume heap, they can consume stack, and 
I've probably forgotten some. Trying to distinguish among these problems 
in a policy document merely obscures the big picture which is pretty 
simple: don't use untrusted regexps.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 'trusted and reasonably structured' regular expressions
  2021-02-26 21:12             ` Paul Eggert
@ 2021-03-02 10:58               ` Siddhesh Poyarekar
  0 siblings, 0 replies; 14+ messages in thread
From: Siddhesh Poyarekar @ 2021-03-02 10:58 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Florian Weimer, Siddhesh Poyarekar via Libc-alpha, Joseph Myers,
	Zack Weinberg

On 2/27/21 2:42 AM, Paul Eggert wrote:
> For this particular case (regexp allocation where exponential behavior 
> is inevitable) I think we should stay away from allocation-related 
> failures entirely, as we're getting into the technical weeds here.
> 
> Untrusted regexps can cause resource problems in lots of different ways: 
> they can consume CPU time within a thread, they can bottleneck other 
> threads from running, they can consume heap, they can consume stack, and 
> I've probably forgotten some. Trying to distinguish among these problems 
> in a policy document merely obscures the big picture which is pretty 
> simple: don't use untrusted regexps.

I've dropped 'unbounded alloca' and published the rest:

https://sourceware.org/glibc/wiki/Security%20Exceptions

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-02 10:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24  4:53 'trusted and reasonably structured' regular expressions Siddhesh Poyarekar
2021-02-24 18:16 ` Paul Eggert
2021-02-24 19:18   ` Florian Weimer
2021-02-24 19:49     ` Paul Eggert
2021-02-24 19:56       ` Florian Weimer
2021-02-24 22:37 ` Joseph Myers
2021-02-25  2:51   ` Siddhesh Poyarekar
2021-02-25  3:03     ` Zack Weinberg
2021-02-25  3:15       ` Siddhesh Poyarekar
2021-02-25  3:26         ` Zack Weinberg
2021-02-25  4:53         ` Paul Eggert
2021-02-25  5:27           ` Siddhesh Poyarekar
2021-02-26 21:12             ` Paul Eggert
2021-03-02 10:58               ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).