public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* grep UTF-8 equivalence class and SMP Unicode chars package check failures
@ 2021-09-01 18:19 Brian Inglis
  0 siblings, 0 replies; only message in thread
From: Brian Inglis @ 2021-09-01 18:19 UTC (permalink / raw)
  To: Cygwin Applications

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]

Hi folks,

Running checks on upgraded grep package and getting test failures on 
some UTF-8 regular expressions: does Cygwin support of Windows locales 
not support matching regular expression equivalence classes nor SMP 
characters?

$ utf8cp 'à' 'é' '𐐅' $'\360\220\220\205'
à U+0000e0
é U+0000e9
𐐅 U+010405
𐐅 U+010405
$ for c in 'à' 'é' '𐐅' $'\360\220\220\205' ; do
	printf "$c "; printf "$c" | xxd -ps -g1;
   done
à c3a0
é c3a9
𐐅 f0909085
𐐅 f0909085
$ grep '[[=a=]]' <<< 'à'; echo $?
1
$ grep '[[=e=]]' <<< 'é'; echo $?
1
$ grep $'\360\220\220\205'  <<< '𐐅' '; echo $?
1

Actual test scripts and log attached.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

[-- Attachment #2: grep-cygwin-test-fail-surrogate-pair.sh --]
[-- Type: text/plain, Size: 1935 bytes --]

#!/bin/sh
# Check the handling of characters outside the Unicode BMP.

# Copyright (C) 2013-2021 Free Software Foundation, Inc.

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.

. "${srcdir=.}/init.sh"; path_prepend_ ../src

require_en_utf8_locale_
require_compiled_in_MB_support

fail=0

s_pair=$(printf '\360\220\220\205')
printf '%s\n' "$s_pair" > in || framework_failure_

LC_ALL=en_US.UTF-8
export LC_ALL

# On Cygwin, before grep-2.15, this would segfault.
# Require not just non-zero exit status, but exactly 1.
returns_ 1 grep -i anything-else in > out 2>&1 || fail=1
# Expect no output.
compare /dev/null out || fail=1

# This must always match, even on a 16-bit-wchar_t system.
grep . in > out 2> err || fail=1

# On platforms where wchar_t is only 16 bits, wchar_t cannot represent
# the character encoded in 'in'.

# On such old systems the above prints nothing on stdout and a diagnostic
# on stderr.  In that case, return early; otherwise, the following tests
# would all fail.
io_pair=$(cat out):$(cat err)
case $io_pair in
  :'grep: in: binary file matches') Exit $fail;;
  $s_pair:) ;;
  *) fail_ "unexpected output: $io_pair"; fail=1;;
esac

# Also test whether a surrogate-pair in the search string works.
for opt in '' -i -E -F -iE -iF; do
  grep --file=in $opt in > out 2>&1 || fail=1
  compare out in || fail=1
done

Exit $fail

[-- Attachment #3: grep-cygwin-test-fails.log --]
[-- Type: text/plain, Size: 11278 bytes --]

XFAIL: equiv-classes
====================

++ initial_cwd_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+++ testdir_prefix_
+++ printf gt
++ pfx_=gt
+++ mktempd_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests gt-equiv-classes.XXXX
+++ case $# in
+++ destdir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+++ template_=gt-equiv-classes.XXXX
+++ MAX_TRIES_=4
+++ case $destdir_ in
+++ destdir_slash_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/
+++ case $template_ in
++++ unset TMPDIR
+++ d=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
+++ case $d in
+++ :
+++ test -d /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
++++ ls -dgo /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
+++ perms='drwx------+ 1 0 Aug 12 19:19 /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6'
+++ case $perms in
+++ :
+++ echo /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
+++ return
++ test_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
++ cd /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
++ case $srcdir in
++ builddir=..
++ export srcdir builddir
++ gl_init_sh_nl_='
'
++ IFS=' 	
'
++ for sig_ in 1 2 3 13 15
+++ expr 1 + 128
++ eval 'trap '\''Exit 129'\'' 1'
+++ trap 'Exit 129' 1
++ for sig_ in 1 2 3 13 15
+++ expr 2 + 128
++ eval 'trap '\''Exit 130'\'' 2'
+++ trap 'Exit 130' 2
++ for sig_ in 1 2 3 13 15
+++ expr 3 + 128
++ eval 'trap '\''Exit 131'\'' 3'
+++ trap 'Exit 131' 3
++ for sig_ in 1 2 3 13 15
+++ expr 13 + 128
++ eval 'trap '\''Exit 141'\'' 13'
+++ trap 'Exit 141' 13
++ for sig_ in 1 2 3 13 15
+++ expr 15 + 128
++ eval 'trap '\''Exit 143'\'' 15'
+++ trap 'Exit 143' 15
++ trap remove_tmp_ 0
+ path_prepend_ ../src
+ test 1 '!=' 0
+ path_dir_=../src
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ require_en_utf8_locale_
+ path_prepend_ .
+ test 1 '!=' 0
+ path_dir_=.
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ case $(get-mb-cur-max en_US.UTF-8) in
++ get-mb-cur-max en_US.UTF-8
+ require_compiled_in_MB_support
+ require_en_utf8_locale_
+ path_prepend_ .
+ test 1 '!=' 0
+ path_dir_=.
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ case $(get-mb-cur-max en_US.UTF-8) in
++ get-mb-cur-max en_US.UTF-8
+ printf $'\303\251'
+ LC_ALL=en_US.UTF-8
+ grep '[[:lower:]]'
é
+ LC_ALL=en_US.UTF-8
+ export LC_ALL
+ echo à
+ grep '[[=a=]]'
+ Exit 1
+ set +e
+ exit 1
+ exit 1
+ remove_tmp_
+ __st=1
+ cleanup_
+ :
+ test '' = yes
+ cd /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+ chmod -R u+rwx /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
+ rm -rf /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-equiv-classes.CbW6
+ exit 1
XFAIL equiv-classes (exit status: 1)
FAIL: surrogate-pair
====================

++ initial_cwd_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+++ testdir_prefix_
+++ printf gt
++ pfx_=gt
+++ mktempd_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests gt-surrogate-pair.XXXX
+++ case $# in
+++ destdir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+++ template_=gt-surrogate-pair.XXXX
+++ MAX_TRIES_=4
+++ case $destdir_ in
+++ destdir_slash_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/
+++ case $template_ in
++++ unset TMPDIR
+++ d=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
+++ case $d in
+++ :
+++ test -d /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
++++ ls -dgo /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
+++ perms='drwx------+ 1 0 Aug 12 19:21 /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs'
+++ case $perms in
+++ :
+++ echo /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
+++ return
++ test_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
++ cd /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
++ case $srcdir in
++ builddir=..
++ export srcdir builddir
++ gl_init_sh_nl_='
'
++ IFS=' 	
'
++ for sig_ in 1 2 3 13 15
+++ expr 1 + 128
++ eval 'trap '\''Exit 129'\'' 1'
+++ trap 'Exit 129' 1
++ for sig_ in 1 2 3 13 15
+++ expr 2 + 128
++ eval 'trap '\''Exit 130'\'' 2'
+++ trap 'Exit 130' 2
++ for sig_ in 1 2 3 13 15
+++ expr 3 + 128
++ eval 'trap '\''Exit 131'\'' 3'
+++ trap 'Exit 131' 3
++ for sig_ in 1 2 3 13 15
+++ expr 13 + 128
++ eval 'trap '\''Exit 141'\'' 13'
+++ trap 'Exit 141' 13
++ for sig_ in 1 2 3 13 15
+++ expr 15 + 128
++ eval 'trap '\''Exit 143'\'' 15'
+++ trap 'Exit 143' 15
++ trap remove_tmp_ 0
+ path_prepend_ ../src
+ test 1 '!=' 0
+ path_dir_=../src
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ require_en_utf8_locale_
+ path_prepend_ .
+ test 1 '!=' 0
+ path_dir_=.
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ case $(get-mb-cur-max en_US.UTF-8) in
++ get-mb-cur-max en_US.UTF-8
+ require_compiled_in_MB_support
+ require_en_utf8_locale_
+ path_prepend_ .
+ test 1 '!=' 0
+ path_dir_=.
+ case $path_dir_ in
+ abs_path_dir_=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $abs_path_dir_ in
+ PATH=/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/../src:/cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/src:./src:/usr/local/bin:/usr/bin:/usr/bin:/usr/local/bin:/cygdrive/c/Windows/system32
+ create_exe_shims_ /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/.
+ case $EXEEXT in
+ return 0
+ shift
+ test 0 '!=' 0
+ export PATH
+ case $(get-mb-cur-max en_US.UTF-8) in
++ get-mb-cur-max en_US.UTF-8
+ printf $'\303\251'
+ LC_ALL=en_US.UTF-8
+ grep '[[:lower:]]'
é
+ fail=0
++ printf '\360\220\220\205'
+ s_pair=$'\360\220\220\205'
+ printf '%s\n' $'\360\220\220\205'
+ LC_ALL=en_US.UTF-8
+ export LC_ALL
+ returns_ 1 grep -i anything-else in
+ compare /dev/null out
+ compare_dev_null_ /dev/null out
+ test 2 = 2
+ test x/dev/null = x/dev/null
+ test -s out
+ return 0
+ return 0
+ grep . in
++ cat out
++ cat err
+ io_pair=$'\360\220\220\205:'
+ case $io_pair in
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.307952300 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in -i in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.339952800 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in -E in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.372953000 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in -F in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.404951400 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in -iE in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.436953200 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ for opt in '' -i -E -F -iE -iF
+ grep --file=in -iF in
+ fail=1
+ compare out in
+ compare_dev_null_ out in
+ test 2 = 2
+ test xout = x/dev/null
+ test xin = x/dev/null
+ return 2
+ case $? in
+ compare_ out in
+ diff -u out in
--- out	2021-08-12 19:21:42.469952300 +0000
+++ in	2021-08-12 19:21:42.250951300 +0000
@@ -0,0 +1 @@
+𐐅
+ fail=1
+ Exit 1
+ set +e
+ exit 1
+ exit 1
+ remove_tmp_
+ __st=1
+ cleanup_
+ :
+ test '' = yes
+ cd /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests
+ chmod -R u+rwx /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
+ rm -rf /cygdrive/d/a/scallywag/grep/grep-3.6-1.x86_64/build/tests/gt-surrogate-pair.N8fs
+ exit 1
FAIL surrogate-pair (exit status: 1)

[-- Attachment #4: grep-cygwin-test-fail-equiv-classes.sh --]
[-- Type: text/plain, Size: 231 bytes --]

#!/bin/sh
# Test that equivalence classes work.

. "${srcdir=.}/init.sh"; path_prepend_ ../src

require_en_utf8_locale_
require_compiled_in_MB_support

LC_ALL=en_US.UTF-8
export LC_ALL

echo à | grep '[[=a=]]' > /dev/null
Exit $?

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-09-01 18:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 18:19 grep UTF-8 equivalence class and SMP Unicode chars package check failures Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).