public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/14301] New: Regular expression wrong match with a number of groups
@ 2012-06-27 7:26 valery_reznic at yahoo dot com
2012-06-27 7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27 7:26 UTC (permalink / raw)
To: glibc-bugs-regex
http://sourceware.org/bugzilla/show_bug.cgi?id=14301
Bug #: 14301
Summary: Regular expression wrong match with a number of groups
Product: glibc
Version: 2.11
Status: NEW
Severity: normal
Priority: P2
Component: regex
AssignedTo: unassigned@sourceware.org
ReportedBy: valery_reznic@yahoo.com
CC: drepper.fsp@gmail.com
Classification: Unclassified
Created attachment 6487
--> http://sourceware.org/bugzilla/attachment.cgi?id=6487
test program for bug demonstration
I compile regular expression with REG_EXTENDED | REG_ICASE (see attached
program)
Then run it as following:
(First parameter - regular expression, second - text to match)
1. ./r '[ ](4[0-9]{15})|(4[0-9]{12})[ ]' ' 4123456789012 ' - match
2. ./r '[ ](4[0-9]{15})|(4[0-9]{12})[ ]' 'r4123456789012' - no match
3. ./r '[ ](4[0-9]{15})|(4[0-9]{12})|(AAA)[ ]' 'r4123456789012' - match
4. ./r '[ ](4[0-9]{12})|(4[0-9]{15})|(AAA)[ ]' 'r4123456789012' - no-match
1. Match, as expected - second group (4[0-9]{12})
2. No, match as expected.
3. Match, when should not be
4. Just for interest - swap second and first group 4[0-9]{12} and 4[0-9]{15}
and all of sudden it works as expected.
Attached program can be compiled with
gcc -Wall r.c -o r
I tested it on systems with following glibc versions:
- 2.3.2 Fedora Core 1, both i386 and x86-64
- 2.11 Fedora 12 - x86-64
- 2.5 CentOS release 5.6 (Final) - i386
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug regex/14301] Regular expression wrong match with a number of groups
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
@ 2012-06-27 7:48 ` schwab@linux-m68k.org
2012-06-27 8:16 ` valery_reznic at yahoo dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: schwab@linux-m68k.org @ 2012-06-27 7:48 UTC (permalink / raw)
To: glibc-bugs-regex
http://sourceware.org/bugzilla/show_bug.cgi?id=14301
Andreas Schwab <schwab@linux-m68k.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID
--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> 2012-06-27 07:47:26 UTC ---
(4[0-9]{12}) matches 4123456789012.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug regex/14301] Regular expression wrong match with a number of groups
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
2012-06-27 7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
@ 2012-06-27 8:16 ` valery_reznic at yahoo dot com
2012-06-27 8:33 ` fweimer at redhat dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27 8:16 UTC (permalink / raw)
To: glibc-bugs-regex
http://sourceware.org/bugzilla/show_bug.cgi?id=14301
Valery <valery_reznic at yahoo dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|INVALID |
--- Comment #2 from Valery <valery_reznic at yahoo dot com> 2012-06-27 08:15:30 UTC ---
(In reply to comment #1)
> (4[0-9]{12}) matches 4123456789012.
So what?
Regular expression required leading and trailing space to be present for match.
IMO, result for case 3) are wrong
Also please note, that in case 3 and 4 regular expressions are essentially the
same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.
But results are different.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug regex/14301] Regular expression wrong match with a number of groups
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
2012-06-27 7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
2012-06-27 8:16 ` valery_reznic at yahoo dot com
@ 2012-06-27 8:33 ` fweimer at redhat dot com
2012-06-27 8:51 ` valery_reznic at yahoo dot com
2014-06-13 14:58 ` fweimer at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2012-06-27 8:33 UTC (permalink / raw)
To: glibc-bugs-regex
http://sourceware.org/bugzilla/show_bug.cgi?id=14301
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
CC| |fweimer at redhat dot com
Resolution| |INVALID
--- Comment #3 from Florian Weimer <fweimer at redhat dot com> 2012-06-27 08:32:43 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > (4[0-9]{12}) matches 4123456789012.
>
> So what?
> Regular expression required leading and trailing space to be present for match.
I don't think so. "|" does not bind this way. a[b]|c|d[e] is equivalent to
(ab)|c|(de), not (a)(b|c|d)(e).
> Also please note, that in case 3 and 4 regular expressions are essentially the
> same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.
This also explains the difference when swapping the parenthesized constructs.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug regex/14301] Regular expression wrong match with a number of groups
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
` (2 preceding siblings ...)
2012-06-27 8:33 ` fweimer at redhat dot com
@ 2012-06-27 8:51 ` valery_reznic at yahoo dot com
2014-06-13 14:58 ` fweimer at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27 8:51 UTC (permalink / raw)
To: glibc-bugs-regex
http://sourceware.org/bugzilla/show_bug.cgi?id=14301
--- Comment #4 from Valery <valery_reznic at yahoo dot com> 2012-06-27 08:50:54 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > (4[0-9]{12}) matches 4123456789012.
> >
> > So what?
> > Regular expression required leading and trailing space to be present for match.
>
> I don't think so. "|" does not bind this way. a[b]|c|d[e] is equivalent to
> (ab)|c|(de), not (a)(b|c|d)(e).
>
> > Also please note, that in case 3 and 4 regular expressions are essentially the
> > same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.
>
> This also explains the difference when swapping the parenthesized constructs.
Got it! Leading space was part only of the first ( ) group, trailing space
was part only of the last ( ) group
For some reason I though that '|' has higher priority than concatenation.
One more pair of the () fixed the problem.
'[ ]((4[0-9]{15})|(4[0-9]{12})|(AAA))[ ]'
Thank you very much for the explanation. I didn't believe how stupid I was.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug regex/14301] Regular expression wrong match with a number of groups
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
` (3 preceding siblings ...)
2012-06-27 8:51 ` valery_reznic at yahoo dot com
@ 2014-06-13 14:58 ` fweimer at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 14:58 UTC (permalink / raw)
To: glibc-bugs-regex
https://sourceware.org/bugzilla/show_bug.cgi?id=14301
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-13 14:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-27 7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
2012-06-27 7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
2012-06-27 8:16 ` valery_reznic at yahoo dot com
2012-06-27 8:33 ` fweimer at redhat dot com
2012-06-27 8:51 ` valery_reznic at yahoo dot com
2014-06-13 14:58 ` fweimer at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).