public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug manual/12045] regex range semantics outside of POSIX should be documented
[not found] <bug-12045-131@http.sourceware.org/bugzilla/>
@ 2012-12-19 10:51 ` schwab@linux-m68k.org
2014-06-30 8:01 ` fweimer at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: schwab@linux-m68k.org @ 2012-12-19 10:51 UTC (permalink / raw)
To: glibc-bugs
http://sourceware.org/bugzilla/show_bug.cgi?id=12045
Andreas Schwab <schwab@linux-m68k.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|drepper.fsp at gmail dot |unassigned at sourceware
|com |dot org
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug manual/12045] regex range semantics outside of POSIX should be documented
[not found] <bug-12045-131@http.sourceware.org/bugzilla/>
2012-12-19 10:51 ` [Bug manual/12045] regex range semantics outside of POSIX should be documented schwab@linux-m68k.org
@ 2014-06-30 8:01 ` fweimer at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: fweimer at redhat dot com @ 2014-06-30 8:01 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=12045
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug manual/12045] regex range semantics outside of POSIX should be documented
2010-09-21 15:25 [Bug regex/12045] New: regex range semantics outside of POSIX should be documented and consistent eblake at redhat dot com
@ 2010-09-24 12:35 ` bonzini at gnu dot org
0 siblings, 0 replies; 3+ messages in thread
From: bonzini at gnu dot org @ 2010-09-24 12:35 UTC (permalink / raw)
To: glibc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2679 bytes --]
------- Additional Comments From bonzini at gnu dot org 2010-09-24 12:35 -------
It turns out that regex range semantics for glibc are "CEO". They _are_
consistent, it's the locale definition files that are not consistent.
I created a file with the 52 uppercase and lowercase letters and did a "sed -n
/[A-Z]/p" on this file. The results I get are either
this 26 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
or this 51 AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZ
here are the "51" locales:
ar_SA cs_CZ hr_HR hsb_DE is_IS km_KH lo_LA lt_LT lv_LV or_IN pl_PL sk_SK
sl_SI th_TH tr_CY tr_TR
These return 51 for both $l and $l.utf8. Every other locale returns 26 for both
unibyte and multibyte variants.
Locales using glibc's localedata/locales/iso14651_t1_common template return 26.
This template defines the collation like this:
<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a start lowercase
<U00AA> <a>;<PCL>;<EMI>;IGNORE # 199 ª
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200 á
...
<U007A> <z>;<BAS>;<MIN>;IGNORE # 507 z
...
<U00FE> <th>;<BAS>;<MIN>;IGNORE # 516 Þ end lowercase
<U0041> <a>;<BAS>;<CAP>;IGNORE # 517 A start uppercase
<U00C1> <a>;<ACA>;<CAP>;IGNORE # 518 Á
...
<U005A> <z>;<BAS>;<CAP>;IGNORE # 813 Z
...
<U00DE> <th>;<BAS>;<CAP>;IGNORE # 824 þ end uppercase
(There's no end to surprises: [a-z] comes _before_ [A-Z], which is why [A-z]
fails but [a-Z] works).
Instead, the "special" locales above use different sequence, for example in cs_CZ:
<U0041> <U0041>;<NONE>;<CAPITAL>;<U0041> # A
<U0061> <U0041>;<NONE>;<SMALL>;<U0041> # a
<U00AA> <U0041>;<NONE>;<U00AA>;<U0041> # ª
<U00C1> <U0041>;<ACUTE>;<CAPITAL>;<U0041> # Á
<U00E1> <U0041>;<ACUTE>;<SMALL>;<U0041> # á
...
<U005A> <U005A>;<NONE>;<CAPITAL>;<U005A> # Z
<U007A> <U005A>;<NONE>;<SMALL>;<U005A> # z
So, it looks like __collseq_table_lookup is what the POSIX rationale document
calls "CEO". I'll open a bug on the inconsistencies caused by using CEO. In
the meanwhile, this bug remains open for the documentation part.
--
What |Removed |Added
----------------------------------------------------------------------------
Component|regex |manual
Summary|regex range semantics |regex range semantics
|outside of POSIX should be |outside of POSIX should be
|documented and consistent |documented
http://sourceware.org/bugzilla/show_bug.cgi?id=12045
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-06-30 8:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-12045-131@http.sourceware.org/bugzilla/>
2012-12-19 10:51 ` [Bug manual/12045] regex range semantics outside of POSIX should be documented schwab@linux-m68k.org
2014-06-30 8:01 ` fweimer at redhat dot com
2010-09-21 15:25 [Bug regex/12045] New: regex range semantics outside of POSIX should be documented and consistent eblake at redhat dot com
2010-09-24 12:35 ` [Bug manual/12045] regex range semantics outside of POSIX should be documented bonzini at gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).