public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* Fix regexp parsing with RE_CONTEXT_INDEP_OPS
@ 2003-12-13  9:56 Andreas Schwab
  2003-12-13 10:52 ` Jakub Jelinek
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Schwab @ 2003-12-13  9:56 UTC (permalink / raw)
  To: libc-hacker

When parsed with RE_SYNTAX_EGREP (which includes RE_CONTEXT_INDEP_OPS)
the regexp "(*)b" results in the error "unmatched ) or \)", but it
should compile successfully (treating it as if "()b").  This is part
of the Spencer tests in grep.

Andreas.

2003-12-13  Andreas Schwab  <schwab@suse.de>

	* posix/regcomp.c (parse_expression): Don't recurse when the end
	of a subexp is seen after a repetition operator with
	RE_CONTEXT_INVALID_DUP.

--- posix/regcomp.c.~1.67.~	2003-11-30 00:08:06.000000000 +0100
+++ posix/regcomp.c	2003-12-13 09:59:26.000000000 +0100
@@ -2191,6 +2191,8 @@ parse_expression (regexp, preg, token, s
       else if (syntax & RE_CONTEXT_INDEP_OPS)
 	{
 	  fetch_token (token, regexp, syntax);
+	  if (token->type == OP_CLOSE_SUBEXP || token->type == OP_ALT)
+	    return NULL;
 	  return parse_expression (regexp, preg, token, syntax, nest, err);
 	}
       /* else fall through  */

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix regexp parsing with RE_CONTEXT_INDEP_OPS
  2003-12-13  9:56 Fix regexp parsing with RE_CONTEXT_INDEP_OPS Andreas Schwab
@ 2003-12-13 10:52 ` Jakub Jelinek
  2003-12-13 12:10   ` Andreas Schwab
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2003-12-13 10:52 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-hacker

On Sat, Dec 13, 2003 at 10:56:28AM +0100, Andreas Schwab wrote:
> When parsed with RE_SYNTAX_EGREP (which includes RE_CONTEXT_INDEP_OPS)
> the regexp "(*)b" results in the error "unmatched ) or \)", but it
> should compile successfully (treating it as if "()b").  This is part
> of the Spencer tests in grep.

Then please add that into glibc testsuite as well.
IMHO all regex bugfixes should be accompanied by testsuite additions.
I think bug-regex13.c if you want it to use RE_SYNTAX_EGREP or
bug-regex11.c if you want to use REG_EXTENDED (which has
RE_CONTEXT_INDEP_OPS set too) would be natural choice.
Or rxspencer/tests if you think it comes really from Spencer.
What surprises me is that (*)b actually doesn't come from Spencer's
testsuite, at least not rxspencer-alpha3.8.g3.tar.bz2.
The only thing it has in the original is:
a\(*\)b         b       a*b     a*b
(i.e. as BRE).

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix regexp parsing with RE_CONTEXT_INDEP_OPS
  2003-12-13 10:52 ` Jakub Jelinek
@ 2003-12-13 12:10   ` Andreas Schwab
  2003-12-14 23:08     ` Ulrich Drepper
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Schwab @ 2003-12-13 12:10 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: libc-hacker

Jakub Jelinek <jakub@redhat.com> writes:

> Then please add that into glibc testsuite as well.

Ok.

> Or rxspencer/tests if you think it comes really from Spencer.
> What surprises me is that (*)b actually doesn't come from Spencer's
> testsuite, at least not rxspencer-alpha3.8.g3.tar.bz2.

It's in tests/spencer1.tests in the grep sources, whereas the
rxspencer tests are in tests/spencer2.tests, AFAICS.  Posixly the
behaviour of "(*)" is undefined, which is probably why it's not in
rxspencer.  But it looks like it's part of traditional egrep
behaviour, the spencer1 tests have been part of grep at least since
version 2.0.

Andreas.

2003-12-13  Andreas Schwab  <schwab@suse.de>

	* posix/regcomp.c (parse_expression): Don't recurse when the end
	of a subexp is seen after a repetition operator with
	RE_CONTEXT_INVALID_DUP.
        * posix/bug-regex13.c: Add test for "(*)b".

--- posix/bug-regex13.c.~1.3.~	2002-12-03 00:42:00.000000000 +0100
+++ posix/bug-regex13.c	2003-12-13 12:47:48.000000000 +0100
@@ -1,5 +1,5 @@
 /* Regular expression tests.
-   Copyright (C) 2002 Free Software Foundation, Inc.
+   Copyright (C) 2002, 2003 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>, 2002.
 
@@ -34,7 +34,8 @@ static struct
 } tests[] = {
   {RE_BACKSLASH_ESCAPE_IN_LISTS, "[0\\-9]", "1", -1}, /* It should not match.  */
   {RE_BACKSLASH_ESCAPE_IN_LISTS, "[0\\-9]", "-", 0}, /* It should match.  */
-  {RE_SYNTAX_POSIX_BASIC, "s1\n.*\ns3", "s1\ns2\ns3", 0}
+  {RE_SYNTAX_POSIX_BASIC, "s1\n.*\ns3", "s1\ns2\ns3", 0},
+  {RE_SYNTAX_EGREP, "(*)b", "b", 0}
 };
 
 int
--- posix/regcomp.c.~1.67.~	2003-11-30 00:08:06.000000000 +0100
+++ posix/regcomp.c	2003-12-13 09:59:26.000000000 +0100
@@ -2191,6 +2191,8 @@ parse_expression (regexp, preg, token, s
       else if (syntax & RE_CONTEXT_INDEP_OPS)
 	{
 	  fetch_token (token, regexp, syntax);
+	  if (token->type == OP_CLOSE_SUBEXP || token->type == OP_ALT)
+	    return NULL;
 	  return parse_expression (regexp, preg, token, syntax, nest, err);
 	}
       /* else fall through  */

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix regexp parsing with RE_CONTEXT_INDEP_OPS
  2003-12-13 12:10   ` Andreas Schwab
@ 2003-12-14 23:08     ` Ulrich Drepper
  2003-12-15 12:18       ` Andreas Schwab
  0 siblings, 1 reply; 5+ messages in thread
From: Ulrich Drepper @ 2003-12-14 23:08 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-hacker

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andreas Schwab wrote:
> Posixly the
> behaviour of "(*)" is undefined, which is probably why it's not in
> rxspencer.  But it looks like it's part of traditional egrep
> behaviour,

Where do you have this "traditional egrep behavior" information from?
Solaris' egrep fails with errors for "(*)b" and "()b".  Which
implementation behaves the way you expect it?

- -- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/3O0T2ijCOnn/RHQRArUrAJ4xsKDAGPvtTKov18VC467fMRnfcgCfbrW9
fhM/iCNCyvlc7CjjXA9OQTo=
=oOha
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix regexp parsing with RE_CONTEXT_INDEP_OPS
  2003-12-14 23:08     ` Ulrich Drepper
@ 2003-12-15 12:18       ` Andreas Schwab
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Schwab @ 2003-12-15 12:18 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: libc-hacker

Ulrich Drepper <drepper@redhat.com> writes:

> Andreas Schwab wrote:
>> Posixly the
>> behaviour of "(*)" is undefined, which is probably why it's not in
>> rxspencer.  But it looks like it's part of traditional egrep
>> behaviour,
>
> Where do you have this "traditional egrep behavior" information from?

It's part of the orignal GNU grep testsuite.  I didn't claim it's
first hand information.

> Solaris' egrep fails with errors for "(*)b" and "()b".

Which is ok, since POSIX makes it undefined.  Which errors, btw?

> Which implementation behaves the way you expect it?

The old GNU regexp implementation.  The current behaviour is violates
the specs for RE_CONTEXT_INDEP_OPS, and the error message does not
make any sense.  This is independent of the actual GNU/BSD/Solaris
etc. egrep behaviour.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-12-15 12:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-13  9:56 Fix regexp parsing with RE_CONTEXT_INDEP_OPS Andreas Schwab
2003-12-13 10:52 ` Jakub Jelinek
2003-12-13 12:10   ` Andreas Schwab
2003-12-14 23:08     ` Ulrich Drepper
2003-12-15 12:18       ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).