From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4195 invoked by alias); 19 Oct 2006 23:54:03 -0000 Received: (qmail 4183 invoked by uid 22791); 19 Oct 2006 23:54:02 -0000 X-Spam-Status: No, hits=-0.0 required=5.0 tests=AWL,BAYES_00,DNS_FROM_RFC_ABUSE,SPF_NEUTRAL X-Spam-Check-By: sourceware.org Message-ID: <45381092.2070401@gmail.com> Date: Thu, 19 Oct 2006 23:54:00 -0000 From: =?UTF-8?B?IlJlc2hhdCBTYWJpcSAoUmXFn2F0KSI=?= User-Agent: Mozilla Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: =?UTF-8?B?THVkb3ZpYyBDb3VydMOocw==?= CC: libc-locales@sources.redhat.com Subject: Re: Weird case-insensitive collation References: <87k62w1r7f.fsf@laas.fr> In-Reply-To: <87k62w1r7f.fsf@laas.fr> X-Enigmail-Version: 0.94.1.0 OpenPGP: id=262839AF; url=http://keyserver.veridis.com:11371 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Virus-Scanned: Symantec AntiVirus Scan Engine Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org X-SW-Source: 2006-q4/txt/msg00028.txt.bz2 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ludovic Courtès yazmış: > Hi, > > `strcasecmp ()' behaves wrongly under the `fr_FR' locale. Consider the > following example program: > > #include > #include > #include > #include > > int > main (int argc, char *argv[]) > { > int result; > > if (!setlocale (LC_ALL, "fr_FR.ISO-8859-1")) > abort (); > > result = strcasecmp ("été", "Hiver"); > printf ("result=%i\n", result); > > return (result < 0) ? 0 : 1; > } > > Under French collation conventions, letter `é' (`e' with acute) comes > before `h'. Thus, the word "été" should be "lower than" the word > "hiver". `strcoll ()' returns the right answer (a negative number) but > `strcasecmp ()' wrongfully returns a positive number, regardless of > whether "hiver" is spelt with a capital `H' or not. > > Is this a bug or am I missing something? > > Thanks, > Ludovic. > I think this function is not locale-aware, so it compares characters' integral value, which naturally produces a positive. http://opengroup.org/onlinepubs/007908799/xsh/strcasecmp.html In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower conversions, then a byte comparison. The results are unspecified in other locales. Since is 0xe9, and being an Extended ASCII character greater than both h (0x68) and H (0x48), this doesn't seem to be a bug to me. HTH, Reshat. - -- My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (Cygwin) iD8DBQFFOBCSO75ytyYoOa8RAtvJAJ4hh3k83W6rXdW5OQk1AzbZmybKDwCfWM98 y+onNxS2erMCG+Rc3S+sMmk= =PuTh -----END PGP SIGNATURE-----