public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z]
@ 2008-12-30 17:48 bonzini at gnu dot org
2008-12-31 13:07 ` [Bug regex/9697] " bonzini at gnu dot org
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2008-12-30 17:48 UTC (permalink / raw)
To: glibc-bugs-regex
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 916 bytes --]
(This is Debian bug 510219).
For instance, take U+02E2 MODIFIER LETTER SMALL S:
$ echo ˢ | sed -r 's/[a-z]|[^a-z]//'
ˢ
Expected output: nothing.
Sed does not handle ˢ (U02E2) as a letter (in [a-z]) nor as a
non-letter (in [^a-z]).
I will analyze this soonish.
--
Summary: character does not match neither [a-z] nor [^a-z]
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: regex
AssignedTo: drepper at redhat dot com
ReportedBy: bonzini at gnu dot org
CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
@ 2008-12-31 13:07 ` bonzini at gnu dot org
2008-12-31 15:36 ` bonzini at gnu dot org
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2008-12-31 13:07 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2008-12-31 13:06 -------
Probably related to fastmap:
~$ sed 's/a[^a-z]/ax/g' <<< a˚b # correct
axb
~$ sed 's/[^a-z]/x/g' <<< a˚b # wrong
a˚b
--
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
2008-12-31 13:07 ` [Bug regex/9697] " bonzini at gnu dot org
@ 2008-12-31 15:36 ` bonzini at gnu dot org
2009-01-03 15:13 ` bonzini at gnu dot org
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2008-12-31 15:36 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2008-12-31 15:34 -------
Created an attachment (id=3629)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=3629&action=view)
tentative untested patch
The problem is basically that __btowc cannot distinguish the beginning of a
valid multibyte character from an invalid sequence.
But it is possible to make the fastmap even better, and that's what the
attached patch tries to do.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
2008-12-31 13:07 ` [Bug regex/9697] " bonzini at gnu dot org
2008-12-31 15:36 ` bonzini at gnu dot org
@ 2009-01-03 15:13 ` bonzini at gnu dot org
2009-01-03 15:14 ` bonzini at gnu dot org
2009-01-08 0:43 ` drepper at redhat dot com
4 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2009-01-03 15:13 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2009-01-03 15:08 -------
Created an attachment (id=3634)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=3634&action=view)
working patch
Here is the logic I used:
- for [^...], [[:class:]], [[=elem=] try at every position that starts
a multibyte character. The correct singlebyte character positions are
chosen by the corresponding SIMPLE_BRACKET.
- for [[.elem.]] and ranges, try at every position that might start a
multibyte collation element. Again, singlebyte collation elements are
taken care of by SIMPLE_BRACKETs.
- unless the second bullet is used, of course, multibyte characters
must be added separately to the fastmap.
Tested on i686-pc-linux-gnu together with the other patch I sent on
2008-12-31, and on which this depends. Ok?
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3629 is|0 |1
obsolete| |
AssignedTo|drepper at redhat dot com |bonzini at gnu dot org
Status|NEW |ASSIGNED
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
` (2 preceding siblings ...)
2009-01-03 15:13 ` bonzini at gnu dot org
@ 2009-01-03 15:14 ` bonzini at gnu dot org
2009-01-08 0:43 ` drepper at redhat dot com
4 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2009-01-03 15:14 UTC (permalink / raw)
To: glibc-bugs-regex
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3634|2009-01-03-Paolo-Bonzini- |fix-9697.patch
filename|bonzini-gnu.org.patch |
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
` (3 preceding siblings ...)
2009-01-03 15:14 ` bonzini at gnu dot org
@ 2009-01-08 0:43 ` drepper at redhat dot com
4 siblings, 0 replies; 7+ messages in thread
From: drepper at redhat dot com @ 2009-01-08 0:43 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2009-01-08 00:43 -------
Patch is in cvs.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=9697
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/9697] character does not match neither [a-z] nor [^a-z]
[not found] <bug-9697-132@http.sourceware.org/bugzilla/>
@ 2014-07-02 7:29 ` fweimer at redhat dot com
0 siblings, 0 replies; 7+ messages in thread
From: fweimer at redhat dot com @ 2014-07-02 7:29 UTC (permalink / raw)
To: glibc-bugs-regex
https://sourceware.org/bugzilla/show_bug.cgi?id=9697
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-07-02 7:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-30 17:48 [Bug regex/9697] New: character does not match neither [a-z] nor [^a-z] bonzini at gnu dot org
2008-12-31 13:07 ` [Bug regex/9697] " bonzini at gnu dot org
2008-12-31 15:36 ` bonzini at gnu dot org
2009-01-03 15:13 ` bonzini at gnu dot org
2009-01-03 15:14 ` bonzini at gnu dot org
2009-01-08 0:43 ` drepper at redhat dot com
[not found] <bug-9697-132@http.sourceware.org/bugzilla/>
2014-07-02 7:29 ` fweimer at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).