public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* Weird case-insensitive collation
@ 2006-10-19 11:35 Ludovic Courtès
  2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-19 11:35 UTC (permalink / raw)
  To: libc-locales

Hi,

`strcasecmp ()' behaves wrongly under the `fr_FR' locale.  Consider the
following example program:

  #include <stdlib.h>
  #include <stdio.h>
  #include <locale.h>
  #include <strings.h>

  int
  main (int argc, char *argv[])
  {
    int result;

    if (!setlocale (LC_ALL, "fr_FR.ISO-8859-1"))
      abort ();

    result = strcasecmp ("été", "Hiver");
    printf ("result=%i\n", result);

    return (result < 0) ? 0 : 1;
  }

Under French collation conventions, letter `é' (`e' with acute) comes
before `h'.  Thus, the word "été" should be "lower than" the word
"hiver".  `strcoll ()' returns the right answer (a negative number) but
`strcasecmp ()' wrongfully returns a positive number, regardless of
whether "hiver" is spelt with a capital `H' or not.

Is this a bug or am I missing something?

Thanks,
Ludovic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Weird case-insensitive collation
  2006-10-19 11:35 Weird case-insensitive collation Ludovic Courtès
@ 2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
  2006-10-20  9:41   ` Ludovic Courtès
  2006-10-23 21:59   ` Ludovic Courtès
  0 siblings, 2 replies; 5+ messages in thread
From: "Reshat Sabiq (Reşat)" @ 2006-10-19 23:54 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: libc-locales

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ludovic Courtès yazmış:
> Hi,
> 
> `strcasecmp ()' behaves wrongly under the `fr_FR' locale.  Consider the
> following example program:
> 
>   #include <stdlib.h>
>   #include <stdio.h>
>   #include <locale.h>
>   #include <strings.h>
> 
>   int
>   main (int argc, char *argv[])
>   {
>     int result;
> 
>     if (!setlocale (LC_ALL, "fr_FR.ISO-8859-1"))
>       abort ();
> 
>     result = strcasecmp ("été", "Hiver");
>     printf ("result=%i\n", result);
> 
>     return (result < 0) ? 0 : 1;
>   }
> 
> Under French collation conventions, letter `é' (`e' with acute) comes
> before `h'.  Thus, the word "été" should be "lower than" the word
> "hiver".  `strcoll ()' returns the right answer (a negative number) but
> `strcasecmp ()' wrongfully returns a positive number, regardless of
> whether "hiver" is spelt with a capital `H' or not.
> 
> Is this a bug or am I missing something?
> 
> Thanks,
> Ludovic.
> 
I think this function is not locale-aware, so it compares characters'
integral value, which naturally produces a positive.
http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
conversions, then a byte comparison. The results are unspecified in
other locales.

Since is 0xe9, and being an Extended ASCII character greater than both h
(0x68) and H (0x48), this doesn't seem to be a bug to me.

HTH,
Reshat.

- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFFOBCSO75ytyYoOa8RAtvJAJ4hh3k83W6rXdW5OQk1AzbZmybKDwCfWM98
y+onNxS2erMCG+Rc3S+sMmk=
=PuTh
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Weird case-insensitive collation
  2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
@ 2006-10-20  9:41   ` Ludovic Courtès
  2006-10-23 21:59   ` Ludovic Courtès
  1 sibling, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-20  9:41 UTC (permalink / raw)
  To: Reshat Sabiq (Reşat); +Cc: libc-locales

Hi,

"Reshat Sabiq (Reşat)" <tatar.iqtelif.i18n@gmail.com> writes:

> I think this function is not locale-aware, so it compares characters'
> integral value, which naturally produces a positive.
> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
> conversions, then a byte comparison. The results are unspecified in
> other locales.

It is true that POSIX is not so clear wrt. locale-dependence.  However,
while it says how the function should behave under the `POSIX' locale,
it does not explicitly mention how it should deal with other locales.

The glibc manual is more explicit in that respect [0]:

  In the standard "C"locale the characters and do not match but in a
  locale which regards these characters as parts of the alphabet they do
  match.

This seems to imply that `strcasecmp ()' knows how to deal with other
locales.

Thanks,
Ludovic.

[0] http://www.gnu.org/software/libc/manual/html_node/String_002fArray-Comparison.html#String_002fArray-Comparison

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Weird case-insensitive collation
  2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
  2006-10-20  9:41   ` Ludovic Courtès
@ 2006-10-23 21:59   ` Ludovic Courtès
  2006-10-29 21:18     ` "Reshat Sabiq (Reşat)"
  1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2006-10-23 21:59 UTC (permalink / raw)
  To: Reshat Sabiq (Reşat); +Cc: libc-locales

Hi,

"Reshat Sabiq (Reşat)" <tatar.iqtelif.i18n@gmail.com> writes:

> I think this function is not locale-aware, so it compares characters'
> integral value, which naturally produces a positive.
> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
> conversions, then a byte comparison. The results are unspecified in
> other locales.

It occurred to me that I had not payed enough attention to that last
sentence from POSIX, sorry about that.

Still, I find it confusing that the glibc manual specifies (using an
example) the behavior of `strcasecmp ()' with other locales.

Thanks,
Ludovic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Weird case-insensitive collation
  2006-10-23 21:59   ` Ludovic Courtès
@ 2006-10-29 21:18     ` "Reshat Sabiq (Reşat)"
  0 siblings, 0 replies; 5+ messages in thread
From: "Reshat Sabiq (Reşat)" @ 2006-10-29 21:18 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: libc-locales

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ludovic Courtès yazmýþ:
> Hi,
>> http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html
>> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower
>> conversions, then a byte comparison. The results are unspecified in
>> other locales.
> 
> It occurred to me that I had not payed enough attention to that last
> sentence from POSIX, sorry about that.
> 
> Still, I find it confusing that the glibc manual specifies (using an
> example) the behavior of `strcasecmp ()' with other locales.
> 
> Thanks,
> Ludovic.
> 
Yes, i was thinking after your previous message that it could be a doc
bug. If so it should be logged, IMHO.

HTH,
Reshat.

- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFFRRqYO75ytyYoOa8RAizgAKCiKGeBbbiIWv9UTPpPgnHiXzANwQCgiFhG
H7kVoge+m0tk6Rx1QngY+c4=
=dfXh
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-10-29 21:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-19 11:35 Weird case-insensitive collation Ludovic Courtès
2006-10-19 23:54 ` "Reshat Sabiq (Reşat)"
2006-10-20  9:41   ` Ludovic Courtès
2006-10-23 21:59   ` Ludovic Courtès
2006-10-29 21:18     ` "Reshat Sabiq (Reşat)"

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).