On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote: > On Nov 24 09:36, Duncan Roe wrote: >> On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote: >>>> On Nov 23 19:27, Duncan Roe wrote: >>>>> Btw to whoever maintains grep for cygwin: 'make check' should pass on >>>>> next release (I patched out the surrogate-pair failre). >>> >>> I had no problems with test-raise last release. >> >> I don't remember having a problem with it even a few weeks ago. >> >>> I did with surrogate pairs but after spending too much time on all the test >>> infrastructure around that, decided it was a low probability event, and wait >>> until anyone complains to refer it upstream. >> >> I wasted time on that too. That's why I patched surrogate-pair to not do its 3rd >> test if 'uname -s' indicates Cygwin. >> >> For the full story, see https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27555#5 > > What is that "permanent restriction" in Cygwin? Is that something we > could fix or something unfixable? Did you try to debug Cygwin in terms > of that problem? If not, could you extract a reduced, very simple > stand-alone testcase for further debugging? > >>> Do Cygwin and/or Windows support surrogate pairs in UTF-8? > > You mean UTF-16. UTF-8 doesn't know surrogate pairs, UTF-16 does. > Originally there was UCS-2, 16 bits, with only 65536 code points. > However, Unicode left the BMP already with version 2.0 in 1996, so > UTF-16 and surrogate pairs became necessary. Windows as well as Cygwin > support them. How does Cygwin support UTF-16 locales with surrogate pairs? Are they the "native" locales inherited from Windows if others are not specified e.g. UTF-8, some OEM SBCS or MBCS? >> There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess >> surrogate pairs in UTF-8 "mostly work". > > UTF-16. The surrogate stuff is evil at times. Have a look at the > __utf8_wctomb function in > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c > Lone surrogate halfs in an input stream are a problem, for instance. Thus the confusion with grep surrogate pair tests which appear to be running under a UTF-8 locale: see attached surrogate pair extract from cygport --debug grep.cygport check. Trying to rerun cygport build most tests are now "skipped test: failed to find an adequate shell SKIP ... (exit status: 77)"! Something more may have changed (in gnulib?) to invalidate Cygwin shell(s) in something updated since that grep release in August, as I am getting the same skipped tests under GitHub CI, although it could just be that something expects say bash > 4.4 or even >= 5! -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.]