* Weird case-insensitive collation
@ 2006-10-19 11:35 Ludovic Courtès
2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-19 11:35 UTC (permalink / raw)
To: libc-locales
Hi,
`strcasecmp ()' behaves wrongly under the `fr_FR' locale. Consider the
following example program:
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
#include <strings.h>
int
main (int argc, char *argv[])
{
int result;
if (!setlocale (LC_ALL, "fr_FR.ISO-8859-1"))
abort ();
result = strcasecmp ("été", "Hiver");
printf ("result=%i\n", result);
return (result < 0) ? 0 : 1;
}
Under French collation conventions, letter `é' (`e' with acute) comes
before `h'. Thus, the word "été" should be "lower than" the word
"hiver". `strcoll ()' returns the right answer (a negative number) but
`strcasecmp ()' wrongfully returns a positive number, regardless of
whether "hiver" is spelt with a capital `H' or not.
Is this a bug or am I missing something?
Thanks,
Ludovic.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Weird case-insensitive collation
2006-10-19 11:35 Weird case-insensitive collation Ludovic Courtès
@ 2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
2006-10-20 9:41 ` Ludovic Courtès
2006-10-23 21:59 ` Ludovic Courtès
0 siblings, 2 replies; 5+ messages in thread
From: "Reshat Sabiq (Reşat)" @ 2006-10-19 23:54 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: libc-locales
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ludovic Courtès yazmıÅ:
> Hi,
>
> `strcasecmp ()' behaves wrongly under the `fr_FR' locale. Consider the
> following example program:
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <locale.h>
> #include <strings.h>
>
> int
> main (int argc, char *argv[])
> {
> int result;
>
> if (!setlocale (LC_ALL, "fr_FR.ISO-8859-1"))
> abort ();
>
> result = strcasecmp ("été", "Hiver");
> printf ("result=%i\n", result);
>
> return (result < 0) ? 0 : 1;
> }
>
> Under French collation conventions, letter `é' (`e' with acute) comes
> before `h'. Thus, the word "été" should be "lower than" the word
> "hiver". `strcoll ()' returns the right answer (a negative number) but
> `strcasecmp ()' wrongfully returns a positive number, regardless of
> whether "hiver" is spelt with a capital `H' or not.
>
> Is this a bug or am I missing something?
>
> Thanks,
> Ludovic.
>
I think this function is not locale-aware, so it compares characters'
integral value, which naturally produces a positive.
http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
conversions, then a byte comparison. The results are unspecified in
other locales.
Since is 0xe9, and being an Extended ASCII character greater than both h
(0x68) and H (0x48), this doesn't seem to be a bug to me.
HTH,
Reshat.
- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
iD8DBQFFOBCSO75ytyYoOa8RAtvJAJ4hh3k83W6rXdW5OQk1AzbZmybKDwCfWM98
y+onNxS2erMCG+Rc3S+sMmk=
=PuTh
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Weird case-insensitive collation
2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
@ 2006-10-20 9:41 ` Ludovic Courtès
2006-10-23 21:59 ` Ludovic Courtès
1 sibling, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-20 9:41 UTC (permalink / raw)
To: Reshat Sabiq (Reşat); +Cc: libc-locales
Hi,
"Reshat Sabiq (Reşat)" <tatar.iqtelif.i18n@gmail.com> writes:
> I think this function is not locale-aware, so it compares characters'
> integral value, which naturally produces a positive.
> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
> conversions, then a byte comparison. The results are unspecified in
> other locales.
It is true that POSIX is not so clear wrt. locale-dependence. However,
while it says how the function should behave under the `POSIX' locale,
it does not explicitly mention how it should deal with other locales.
The glibc manual is more explicit in that respect [0]:
In the standard "C"locale the characters and do not match but in a
locale which regards these characters as parts of the alphabet they do
match.
This seems to imply that `strcasecmp ()' knows how to deal with other
locales.
Thanks,
Ludovic.
[0] http://www.gnu.org/software/libc/manual/html_node/String_002fArray-Comparison.html#String_002fArray-Comparison
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Weird case-insensitive collation
2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
2006-10-20 9:41 ` Ludovic Courtès
@ 2006-10-23 21:59 ` Ludovic Courtès
2006-10-29 21:18 ` "Reshat Sabiq (Reşat)"
1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-23 21:59 UTC (permalink / raw)
To: Reshat Sabiq (Reşat); +Cc: libc-locales
Hi,
"Reshat Sabiq (Reşat)" <tatar.iqtelif.i18n@gmail.com> writes:
> I think this function is not locale-aware, so it compares characters'
> integral value, which naturally produces a positive.
> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
> conversions, then a byte comparison. The results are unspecified in
> other locales.
It occurred to me that I had not payed enough attention to that last
sentence from POSIX, sorry about that.
Still, I find it confusing that the glibc manual specifies (using an
example) the behavior of `strcasecmp ()' with other locales.
Thanks,
Ludovic.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Weird case-insensitive collation
2006-10-23 21:59 ` Ludovic Courtès
@ 2006-10-29 21:18 ` "Reshat Sabiq (Reşat)"
0 siblings, 0 replies; 5+ messages in thread
From: "Reshat Sabiq (Reşat)" @ 2006-10-29 21:18 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: libc-locales
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ludovic Courtès yazmýþ:
> Hi,
>> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
>> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
>> conversions, then a byte comparison. The results are unspecified in
>> other locales.
>
> It occurred to me that I had not payed enough attention to that last
> sentence from POSIX, sorry about that.
>
> Still, I find it confusing that the glibc manual specifies (using an
> example) the behavior of `strcasecmp ()' with other locales.
>
> Thanks,
> Ludovic.
>
Yes, i was thinking after your previous message that it could be a doc
bug. If so it should be logged, IMHO.
HTH,
Reshat.
- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
iD8DBQFFRRqYO75ytyYoOa8RAizgAKCiKGeBbbiIWv9UTPpPgnHiXzANwQCgiFhG
H7kVoge+m0tk6Rx1QngY+c4=
=dfXh
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-10-29 21:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-19 11:35 Weird case-insensitive collation Ludovic Courtès
2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
2006-10-20 9:41 ` Ludovic Courtès
2006-10-23 21:59 ` Ludovic Courtès
2006-10-29 21:18 ` "Reshat Sabiq (Reşat)"
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).