From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57080 invoked by alias); 29 Oct 2015 15:51:49 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 57068 invoked by uid 89); 29 Oct 2015 15:51:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 29 Oct 2015 15:51:46 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 490DFC0B5931 for ; Thu, 29 Oct 2015 15:51:45 +0000 (UTC) Received: from [10.3.113.189] (ovpn-113-189.phx2.redhat.com [10.3.113.189]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t9TFpidQ003545 for ; Thu, 29 Oct 2015 11:51:45 -0400 Subject: Re: Bug in collation functions? To: cygwin@cygwin.com References: <563148AF.1000502@cornell.edu> <5631996D.7040908@redhat.com> <20151029075050.GE5319@calimero.vinschen.de> <20151029083057.GH5319@calimero.vinschen.de> <56321815.7000203@cornell.edu> <20151029153516.GJ5319@calimero.vinschen.de> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Message-ID: <56324089.2090702@redhat.com> Date: Thu, 29 Oct 2015 16:17:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151029153516.GJ5319@calimero.vinschen.de> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R" X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00539.txt.bz2 --hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-length: 2161 On 10/29/2015 09:35 AM, Corinna Vinschen wrote: >>> Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but the= re >>> are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS. I'm open to a >>> discussion how to change the settings to more closely resemble the rules >>> on Linux. >>> >>> E.g. wcscoll simply calls wcscmp rather than CompareStringW for the >>> C/POSIX locale anyway. So, would it makes sense to set the flags to >>> NORM_IGNORESYMBOLS in other locales? >> >> I think so. That's what the native Windows build of emacs does in this >> situation. >=20 > Is that all it's doing? I'm asking because using NORM_IGNORESYMBOLS > does not exaclty resemble the behaviour on Linux on my W10 box: >=20 > "11" > "1.1" in POSIX locale > !!! "11" > "1.1" in en_US.UTF-8 locale > "11" > "1 2" in POSIX locale > "11" < "1 2" in en_US.UTF-8 locale >=20 I'm not sure if blindly enabling the flags for all locales makes sense, though. I haven't audited glibc locales to know for sure, but it is my impression that it is up to the locale author on whether whitespace affects collation; and while the author of glibc en_US.UTF-8 may have chosen that way, I can't guarantee that some other locales in glibc still treat whitespace as significant. POSIX has a notion of writing your own locale definition - and glibc definitely supports that (although I haven't personally tried doing it), where you can set your OWN collation rules while inheriting the bulk of the work from an existing locale. So in glibc, it is possible to have a locale similar to en_US.UTF-8 but where whitespace IS significant in collation. I know cygwin isn't there yet (we expose the Windows locale, but do not let you define your own). This seems like the sort of thing where maybe we'd want support for user-defined locales, compiled into a binary format, and then cygwin opens the binary locale definition for deciding which flags to use according to the locale being used. But that sounds like a LOT of work, for a questionable amount of gain. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" Content-length: 604 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWMkCJAAoJEKeha0olJ0NqfYYIAIaC3ujaDnsMJ5pd2RjUG/Ve DfFGAfi/CChGdyxN8eUSyfK6T2+HcoaDgH3qrWBfb9/V3h81exkmyFnEarxXaJVw 1gt24MhB1ZqNuclX484RE7tuN0j4WQ8EDjWy+Eqnwxp64JwIG/ag9oCCZ7TUbA++ fr0/KjObWMYoiyzE4I0szU+JWGw/dMTqAQDIFMMgZWGIs2pUBjLCI7nNHX1ObMN8 VJTT3B1bXbw8A8UZ6yVUyz8PwGU/X/TMF5lwylChcjWFys4+PS2UpheC3Uq1GkfF LoN0eBpFn7Rir+NKEgKwFx7uAoRop8e1SE4LgqH8MxYNPnis6mbRMOIsKvDhuH0= =sOMy -----END PGP SIGNATURE----- --hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R--