public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: cygwin@cygwin.com
Subject: Re: Bug in collation functions?
Date: Thu, 29 Oct 2015 16:17:00 -0000	[thread overview]
Message-ID: <56324089.2090702@redhat.com> (raw)
In-Reply-To: <20151029153516.GJ5319@calimero.vinschen.de>

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

On 10/29/2015 09:35 AM, Corinna Vinschen wrote:

>>> Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but there
>>> are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS.  I'm open to a
>>> discussion how to change the settings to more closely resemble the rules
>>> on Linux.
>>>
>>> E.g.  wcscoll simply calls wcscmp rather than CompareStringW for the
>>> C/POSIX locale anyway.  So, would it makes sense to set the flags to
>>> NORM_IGNORESYMBOLS in other locales?
>>
>> I think so.  That's what the native Windows build of emacs does in this
>> situation.
> 
> Is that all it's doing?  I'm asking because using NORM_IGNORESYMBOLS
> does not exaclty resemble the behaviour on Linux on my W10 box:
> 
>     "11" > "1.1" in POSIX locale
> !!! "11" > "1.1" in en_US.UTF-8 locale
>     "11" > "1 2" in POSIX locale
>     "11" < "1 2" in en_US.UTF-8 locale
> 

I'm not sure if blindly enabling the flags for all locales makes sense,
though.  I haven't audited glibc locales to know for sure, but it is my
impression that it is up to the locale author on whether whitespace
affects collation; and while the author of glibc en_US.UTF-8 may have
chosen that way, I can't guarantee that some other locales in glibc
still treat whitespace as significant.

POSIX has a notion of writing your own locale definition - and glibc
definitely supports that (although I haven't personally tried doing it),
where you can set your OWN collation rules while inheriting the bulk of
the work from an existing locale.   So in glibc, it is possible to have
a locale similar to en_US.UTF-8 but where whitespace IS significant in
collation.  I know cygwin isn't there yet (we expose the Windows locale,
but do not let you define your own).

This seems like the sort of thing where maybe we'd want support for
user-defined locales, compiled into a binary format, and then cygwin
opens the binary locale definition for deciding which flags to use
according to the locale being used.  But that sounds like a LOT of work,
for a questionable amount of gain.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

      parent reply	other threads:[~2015-10-29 15:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29  7:41 Ken Brown
2015-10-29  7:50 ` Eric Blake
2015-10-29 12:58   ` Corinna Vinschen
2015-10-29 15:35     ` Corinna Vinschen
2015-10-29 15:51       ` Ken Brown
2015-10-29 16:14         ` Corinna Vinschen
2015-10-29 16:14           ` Ken Brown
2015-10-29 16:51             ` Ken Brown
2015-10-29 18:09               ` Eric Blake
2015-10-29 21:58                 ` Ken Brown
2015-10-30  8:05                   ` Ken Brown
2015-10-30 14:07                     ` Ken Brown
2015-10-30 19:11                       ` Corinna Vinschen
2015-10-30 19:14                         ` Ken Brown
2015-10-30 21:13                           ` Corinna Vinschen
     [not found]                           ` <5634F6BA.7070301@cornell.edu>
2015-11-02 11:14                             ` Corinna Vinschen
2015-10-29 16:17           ` Eric Blake [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56324089.2090702@redhat.com \
    --to=eblake@redhat.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).