public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Bug in collation functions?
Date: Mon, 02 Nov 2015 11:14:00 -0000	[thread overview]
Message-ID: <20151102111358.GZ5319@calimero.vinschen.de> (raw)
In-Reply-To: <5634F6BA.7070301@cornell.edu>

[-- Attachment #1: Type: text/plain, Size: 2908 bytes --]

On Oct 31 13:13, Ken Brown wrote:
> On 10/30/2015 10:07 AM, Ken Brown wrote:
> >Hi Corinna,
> >
> >On 10/30/2015 8:03 AM, Corinna Vinschen wrote:
> >>On Oct 29 18:21, Ken Brown wrote:
> >>>The fallback I had in mind is to return the shorter string if they have
> >>>different lengths and otherwise to revert to wcscmp.
> > >
> >>I had a longer look into this suggestion and the below code and it took
> >>me some time to find out what bugged me with it:
> >>
> >>What about str/wcsxfrm?
> >>
> >>Per POSIX, calling strcmp on the result of strxfrm is equivalent to
> >>calling strcoll (analogue with wcs*).  If you extend *coll to perform an
> >>extra check on the length, you will have cases in which the above rule
> >>fails.  You can't perform the length test on the result of *xfrm and
> >>expect the same result as in *coll.
> >>
> >>In fact, when calling LCMapStringW with NORM_IGNORESYMOLS (you would
> >>have to do this anyway if we add this flag in *coll), the resulting
> >>transformed strings created from the input strings "11" and "1.1" would
> >>be identical, so a length test on the xfrm string is not meaningful at
> >>all.
> >>
> >>The bottom line is, afaics, we must make sure that CompareStringW and
> >>LCMapStringW are called the same way, and their result/output has to be
> >>returned to the caller.  Performing an extra check in *coll which can't
> >>be reliably performed in *xfrm is not feasible.
> >>
> >>Does that make sense?
> >
> >Yes, I see the problem, and I don't see a good way around it.  So I
> >think we probably have to leave things as they are and live with the
> >fact that we can't do comparisons that ignore whitespace and punctuation.
> >
> >The alternative of allowing str/wcscoll to return 0 on unequal strings
> >doesn't seem feasible in view of Eric's comments.
> 
> I have one other idea.  What would you think of defining a function
> cygwin_strcoll that's like strcoll but with an extra bool parameter
> 'ignoresymbols'?  If ignoresymbols = false, this would be the same as
> strcoll.  If ignoresymbols = true, this would use NORM_IGNORESYMBOLS with
> the fallback I suggested.
> 
> That way applications that prefer to be more glibc-compatible and don't need
> strxfrm could do something like
> 
>   #define strcoll(A,B) cygwin_strcoll ((A), (B), true)
> 
> If you think this is reasonable, I'll submit a patch.  If not, no problem.

No, I don't think this is feasible.  Given Eric's comments, can an
application ever expect that strcoll behaves exactly as on Linux?  For
portability reasons, it has to expect different results on different
platforms.  Only if the result is POSIXly incorrect, it makes sense to
fix the behaviour, IMHO.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2015-11-02 11:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29  7:41 Ken Brown
2015-10-29  7:50 ` Eric Blake
2015-10-29 12:58   ` Corinna Vinschen
2015-10-29 15:35     ` Corinna Vinschen
2015-10-29 15:51       ` Ken Brown
2015-10-29 16:14         ` Corinna Vinschen
2015-10-29 16:14           ` Ken Brown
2015-10-29 16:51             ` Ken Brown
2015-10-29 18:09               ` Eric Blake
2015-10-29 21:58                 ` Ken Brown
2015-10-30  8:05                   ` Ken Brown
2015-10-30 14:07                     ` Ken Brown
2015-10-30 19:11                       ` Corinna Vinschen
2015-10-30 19:14                         ` Ken Brown
2015-10-30 21:13                           ` Corinna Vinschen
     [not found]                           ` <5634F6BA.7070301@cornell.edu>
2015-11-02 11:14                             ` Corinna Vinschen [this message]
2015-10-29 16:17           ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151102111358.GZ5319@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).