public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
From: "bonzini at gnu dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs-regex@sourceware.org
Subject: [Bug regex/52] Repeated and nested subexpressions (reproducible in most other engines)
Date: Thu, 04 Jul 2013 07:22:00 -0000	[thread overview]
Message-ID: <bug-52-132-SI4Gg5aecF@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-52-132@http.sourceware.org/bugzilla/>

http://sourceware.org/bugzilla/show_bug.cgi?id=52

--- Comment #11 from Paolo Bonzini <bonzini at gnu dot org> ---
The justification for the "suspended" state is that this would be very
complicated to fix and wouldn't really add anything to the quality of the
implementation.

Even if the "(a(b)*)*" case would not be hard to fix, I'm not sure we can say
the same of the backreference testcase in the RH bug ('(a(b)*)*\2' matched
against 'abab') or the more complicated '(a(b)*)*x\1\2' matched against
'abaxa'.

The RH bugzilla was opened by Eric doesn't really add anything to the urgency
of this bug; Fedora bugs that also exist upstream can be closed liberally, and
that's what I did.

The grep bug on Savannah (https://savannah.gnu.org/bugs/?37737) might add
something, but it is not clear if the user actually encountered it in a
real-world usecase.  The same "bug" is present in hardly every regular
expression matcher, and I would suggest that the Austin group gives more leeway
to implementations.  For example, the following rules could work:

- it is undefined _which_ occurrence of the sub-RE is captured by a
parenthesized group (and matched in backreferences) if the subgroup, or any of
its parents, is quantified with + * {};

- the backreference should only match the empty string only if the
corresponding sub-RE can be empty, or if the corresponding parenthesized group,
or any of its parents, is quantified with * or {0,...}.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


  parent reply	other threads:[~2013-07-04  7:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-52-132@http.sourceware.org/bugzilla/>
2012-03-08  4:59 ` carlos at systemhalted dot org
2013-07-03 21:25 ` eblake at redhat dot com
2013-07-04  0:28 ` bugdal at aerifal dot cx
2013-07-04  3:37 ` carlos at redhat dot com
2013-07-04  3:38 ` carlos at redhat dot com
2013-07-04  7:22 ` bonzini at gnu dot org [this message]
2013-07-04  7:35 ` bugdal at aerifal dot cx
2013-07-04  7:42 ` bonzini at gnu dot org
2013-07-04  7:51 ` bugdal at aerifal dot cx
2013-07-04  8:04 ` bonzini at gnu dot org
2013-07-04 15:05 ` eggert at gnu dot org
2014-01-31  9:15 ` afaq.ahmed at agilosoft dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-52-132-SI4Gg5aecF@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs-regex@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).