From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57816 invoked by alias); 30 Oct 2015 19:14:44 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 57808 invoked by uid 89); 30 Oct 2015 19:14:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Received: from aquarius.hirmke.de (HELO calimero.vinschen.de) (217.91.18.234) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Oct 2015 19:14:43 +0000 Received: by calimero.vinschen.de (Postfix, from userid 500) id 5F479A805EA; Fri, 30 Oct 2015 20:14:40 +0100 (CET) Date: Fri, 30 Oct 2015 21:13:00 -0000 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: Bug in collation functions? Message-ID: <20151030191440.GP5319@calimero.vinschen.de> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <56321815.7000203@cornell.edu> <20151029153516.GJ5319@calimero.vinschen.de> <56323F2E.4030807@cornell.edu> <56324598.9060604@cornell.edu> <56324E82.7000402@redhat.com> <563268A4.6000005@cornell.edu> <56329462.2090206@cornell.edu> <56329BE8.808@cornell.edu> <20151030120320.GO5319@calimero.vinschen.de> <56337996.2000400@cornell.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="V3GHqwm1rrtpHsCJ" Content-Disposition: inline In-Reply-To: <56337996.2000400@cornell.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-SW-Source: 2015-10/txt/msg00572.txt.bz2 --V3GHqwm1rrtpHsCJ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 2559 On Oct 30 10:07, Ken Brown wrote: > Hi Corinna, >=20 > On 10/30/2015 8:03 AM, Corinna Vinschen wrote: > >On Oct 29 18:21, Ken Brown wrote: > >>The fallback I had in mind is to return the shorter string if they have > >>different lengths and otherwise to revert to wcscmp. > > > >I had a longer look into this suggestion and the below code and it took > >me some time to find out what bugged me with it: > > > >What about str/wcsxfrm? > > > >Per POSIX, calling strcmp on the result of strxfrm is equivalent to > >calling strcoll (analogue with wcs*). If you extend *coll to perform an > >extra check on the length, you will have cases in which the above rule > >fails. You can't perform the length test on the result of *xfrm and > >expect the same result as in *coll. > > > >In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would > >have to do this anyway if we add this flag in *coll), the resulting > >transformed strings created from the input strings "11" and "1.1" would > >be identical, so a length test on the xfrm string is not meaningful at > >all. > > > >The bottom line is, afaics, we must make sure that CompareStringW and > >LCMapStringW are called the same way, and their result/output has to be > >returned to the caller. Performing an extra check in *coll which can't > >be reliably performed in *xfrm is not feasible. > > > >Does that make sense? >=20 > Yes, I see the problem, and I don't see a good way around it. So I think= we > probably have to leave things as they are and live with the fact that we > can't do comparisons that ignore whitespace and punctuation. >=20 > The alternative of allowing str/wcscoll to return 0 on unequal strings > doesn't seem feasible in view of Eric's comments. >=20 > What about the other issue I raised: Should setlocale return null to > indicate an error if it's given an invalid locale name like en_DE.UTF-8? Huh. Interesting. You're runing Windows10, right? After some digging it turns out there's a bug in W10. LocaleNameToLCID() does *not* fail and return with an error if it doesn't know a locale. That would be too simple I guess. Rather, it returns a value LOCALE_CUSTOM_UNSPECIFIED, 0x1000. So all unknown locales are now treated as custom locale. Duh! I fear the answer when trying to report this. Probably it's a feature... I applied a patch to workaround this feature. Thanks for the testcase, btw :) Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --V3GHqwm1rrtpHsCJ Content-Type: application/pgp-signature Content-length: 819 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWM8GgAAoJEPU2Bp2uRE+g+BAP/2Pl1tuE4ieRxZpnPTCb8d78 5pIipE8VFbUwX1cQIaDLN8E01Lck+9w7zrz33YmDSBFESzafrLL45jTTTIs6KiaT 6Z/dFWHkSQ8/ySiHugzPjIHxGwygOvlSSYs/u5+8zV9TKUqCzUP8CGtrVXB//+UF Lw/Lzt4ijuvSNupy8unzJwqpflZM11sVfZjpAcrKlw3uYS/tcAcJWvUV6Ty9WQs4 Dvi//oeFRUzA5npeShfp/JioiPW9bV5dr6R5PWDd4pDH8wp85e0GoWiTM0c/zKxH HRWDnUKxc9WAYMgLISSqBtVrswo9HxkSh+JV7ShhD/hVQlovDbZSr0xCy8Mv1FOx Kj2adRmrCQWrzv3/2rI753/qNv7uSQ7nvJjhDF+KP4k7ONAkqgQAbn13mxQjbHwt /cjvYx5U+q85g6x/w1guYt+XdDLY1qcDQslHyxbO7T+TosMfhaUGr65FobROlTUm uWb8HHNsoWy5HmOXpfxbfHZTmPRHgcDp/7VsgfL0Q2KS9FxHl4lRCK5rBIslcahP rh0gdw+zYmOiJQq1NsYpf/qvyxWBCXhepf0/FSum7YBNREREl+ppM1lNhroA+p3/ H+uLP3dK5IpngyGiJD2eTbY4a5p+0X3vCU2FXj22Pq+JOTO30G8WCXCtxY/i863i zmd//aHtTuFGsayJ3ZQM =MhPb -----END PGP SIGNATURE----- --V3GHqwm1rrtpHsCJ--