From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4988 invoked by alias); 4 Jul 2013 07:22:10 -0000 Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org Received: (qmail 4900 invoked by uid 48); 4 Jul 2013 07:21:58 -0000 From: "bonzini at gnu dot org" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/52] Repeated and nested subexpressions (reproducible in most other engines) Date: Thu, 04 Jul 2013 07:22:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: minor X-Bugzilla-Who: bonzini at gnu dot org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: gotom at debian dot or.jp X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-07/txt/msg00004.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=52 --- Comment #11 from Paolo Bonzini --- The justification for the "suspended" state is that this would be very complicated to fix and wouldn't really add anything to the quality of the implementation. Even if the "(a(b)*)*" case would not be hard to fix, I'm not sure we can say the same of the backreference testcase in the RH bug ('(a(b)*)*\2' matched against 'abab') or the more complicated '(a(b)*)*x\1\2' matched against 'abaxa'. The RH bugzilla was opened by Eric doesn't really add anything to the urgency of this bug; Fedora bugs that also exist upstream can be closed liberally, and that's what I did. The grep bug on Savannah (https://savannah.gnu.org/bugs/?37737) might add something, but it is not clear if the user actually encountered it in a real-world usecase. The same "bug" is present in hardly every regular expression matcher, and I would suggest that the Austin group gives more leeway to implementations. For example, the following rules could work: - it is undefined _which_ occurrence of the sub-RE is captured by a parenthesized group (and matched in backreferences) if the subgroup, or any of its parents, is quantified with + * {}; - the backreference should only match the empty string only if the corresponding sub-RE can be empty, or if the corresponding parenthesized group, or any of its parents, is quantified with * or {0,...}. -- You are receiving this mail because: You are on the CC list for the bug.