* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
@ 2005-09-27 20:05 ` drepper at redhat dot com
2006-01-27 6:00 ` kimura dot koichi at canon dot co dot jp
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: drepper at redhat dot com @ 2005-09-27 20:05 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2005-09-27 20:05 -------
You really cannot use character ranges outside the C locale since the definition
depends on the locale description, more specifically the collation data. It
currently doesn't contain anything for these characters. And even if they
would, there is no guarantee that the result would be as you expect. Just don't
use ranges.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
2005-09-27 20:05 ` [Bug regex/1149] " drepper at redhat dot com
@ 2006-01-27 6:00 ` kimura dot koichi at canon dot co dot jp
2006-02-01 4:48 ` kimura dot koichi at canon dot co dot jp
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: kimura dot koichi at canon dot co dot jp @ 2006-01-27 6:00 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From kimura dot koichi at canon dot co dot jp 2006-01-27 06:00 -------
You say that I shoud not use character ranges in not C locale.
But I have a question yet.
Why characters wchich is start/end of range are not printed?
Half-width katakana characters in SJIS locale has one-byte width
(codepoint is under 0xff) but has large codepoint in Unicode (over U+0100).
In regcomp.c, I guess half-width katakana characters should register as single
byte character to fastmap.
And in regexec.c, half-width katakana characters shoud treat as single-byte
character and call bitset_set() function to register to bitmap.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|WONTFIX |
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
2005-09-27 20:05 ` [Bug regex/1149] " drepper at redhat dot com
2006-01-27 6:00 ` kimura dot koichi at canon dot co dot jp
@ 2006-02-01 4:48 ` kimura dot koichi at canon dot co dot jp
2006-04-25 18:12 ` drepper at redhat dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: kimura dot koichi at canon dot co dot jp @ 2006-02-01 4:48 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From kimura dot koichi at canon dot co dot jp 2006-02-01 04:48 -------
(In reply to comment #2)
I guess I found point of problem.
Here is patch.
--- regcomp.c.1~ 2005-07-18 11:51:43.000000000 +0900
+++ regcomp.c 2006-02-01 13:26:41.078750000 +0900
@@ -397,9 +397,13 @@ re_compile_fastmap_iter (bufp, init_stat
}
# else
if (dfa->mb_cur_max > 1)
- for (i = 0; i < SBC_MAX; ++i)
- if (__btowc (i) == WEOF)
- re_set_fastmap (fastmap, icase, i);
+ for (i = 0; i < SBC_MAX; ++i) {
+ wint_t wc;
+ wc = __btowc (i);
+
+ if (wc == WEOF || wc >= SBC_MAX)
+ re_set_fastmap (fastmap, icase, i);
+ }
# endif /* not _LIBC */
}
for (i = 0; i < cset->nmbchars; ++i)
--- regexec.c.1~ 2005-07-18 11:51:42.000000000 +0900
+++ regexec.c 2006-02-01 13:26:44.016250000 +0900
@@ -3715,6 +3715,7 @@ check_node_accept_bytes (dfa, node_idx,
const re_token_t *node = dfa->nodes + node_idx;
int char_len, elem_len;
int i;
+ wchar_t wc;
if (BE (node->type == OP_UTF8_PERIOD, 0))
{
@@ -3784,7 +3785,8 @@ check_node_accept_bytes (dfa, node_idx,
}
elem_len = re_string_elem_size_at (input, str_idx);
- if ((elem_len <= 1 && char_len <= 1) || char_len == 0)
+ wc = __btowc(*(input->mbs+str_idx));
+ if ((elem_len <= 1 && char_len <= 1) || char_len == 0) && (wc != WEOF && wc <
SBC_MAX))
return 0;
if (node->type == COMPLEX_BRACKET)
This patch is for non-_LIBC part since I could not follow the _LIBC part flow.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
` (2 preceding siblings ...)
2006-02-01 4:48 ` kimura dot koichi at canon dot co dot jp
@ 2006-04-25 18:12 ` drepper at redhat dot com
2006-04-26 7:04 ` bonzini at gnu dot org
2006-05-02 22:33 ` drepper at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: drepper at redhat dot com @ 2006-04-25 18:12 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2006-04-25 18:12 -------
Patches for non-_LIBC shouldn't be sent here. This is the *libc* bugzilla.
Send them to the sed list and let those people look at them.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |WONTFIX
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
` (3 preceding siblings ...)
2006-04-25 18:12 ` drepper at redhat dot com
@ 2006-04-26 7:04 ` bonzini at gnu dot org
2006-05-02 22:33 ` drepper at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: bonzini at gnu dot org @ 2006-04-26 7:04 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From bonzini at gnu dot org 2006-04-26 07:04 -------
So you WONTFIX a bug just because the patch sent is not for glibc? Either the
bug is invalid, and you mark it as INVALID; or you just ignore the patch. But
not WONTFIX.
The patch is not ok because it slows down unnecessarily the function, and regex
is already slow enough. We probably should cache the results of btowc (at least
for the non _LIBC case).
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|WONTFIX |
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug regex/1149] character class with range doesn't match half-width kana in SJIS locale
2005-08-02 4:37 [Bug regex/1149] New: character class with range doesn't match half-width kana in SJIS locale kimura dot koichi at canon dot co dot jp
` (4 preceding siblings ...)
2006-04-26 7:04 ` bonzini at gnu dot org
@ 2006-05-02 22:33 ` drepper at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: drepper at redhat dot com @ 2006-05-02 22:33 UTC (permalink / raw)
To: glibc-bugs-regex
------- Additional Comments From drepper at redhat dot com 2006-05-02 22:33 -------
This is glibc's bugzilla. I mark it WONTFIX because I have nothing to do with
the non-glibc code. Stop reopening.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |WONTFIX
http://sourceware.org/bugzilla/show_bug.cgi?id=1149
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 7+ messages in thread