On 01/29/2011 09:01 AM, Corinna Vinschen wrote: >> So, using UTF-16 surrogate encodings for characters outside the basic >> plane violates POSIX, but it's the best we can do for those characters. > > Right, and we discussed this already on this list. Or the developer > list, I don't remember. Maybe we should have stick to the base plane > and only use UCS-2 to be more POSIX compatible. The burden is on the application, not on cygwin. If the application wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY characters from the basic plane (no surrogates), at which point their use of wchar_t fits the POSIX definition (one wchar_t per character). The moment they pass a surrogate, they are no longer honoring the restriction documented by __STDC_ISO_10646__ so they are no longer under the rules of POSIX, and then cygwin can do whatever it wants (and in this case, QoI demands that we honor surrogates to the best of our ability for full UTF-16 support, and you can have multi-wchar_t characters just as you already have multi-byte UTF-8 char characters). In other words, cygwin IS being POSIX-compliant by advertising only the Unicode 4.0 character set in the __STDC_ISO_10646__, while still supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an extension when you no longer care about POSIX. > However, the POSIX definition doesn't contradict what I said about the > definition of __STDC_ISO_10646__ as far as I'm concerned. Yep - I think we're in violent agreement :) -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org