public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* The grep 3.11 application when used in perl-regexp mode appears to now be broken
@ 2024-03-16 18:00 Michael Goldshteyn
  2024-03-16 19:08 ` Kevin Schnitzius
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Goldshteyn @ 2024-03-16 18:00 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3760 bytes --]

I just updated my Cygwin64 installation, which includes the grep
utility and its behavior has changed. It no longer works like it used to
for Perl reg-ex matching, as demonstrated below:

Simple test cases:
======================
$ ls -l a
-rwxr-xr-x 1 Michael None 6 Mar 16 12:15 a

$ hexdump -C a
00000000  31 30 30 30 0d 0a                                 |1000..|
00000006

# Notice the CR/LF encoding after the "1000" text, as is the case for DOS
text files

# Now let's test grep regular match
$ grep --version
grep (GNU grep) 3.11
Packaged by Cygwin (3.11-1)
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.43 2024-02-16

$ grep '000' a
1000

# Match using pcre2
$ grep -P '000' a
1000

# OK, so far so good
$ grep -P '000$' a
# No match

# Put another way
$ grep -c -P '000$' a
0

# Now you may be thinking, OK, it's because of the CR/LF line ending
# But, I present the following
$ pcre2grep --version
pcre2grep version 10.43 2024-02-16

$ pcre2grep '000$' a
1000

# As a further cross-check, the same version of the cygpcre2-8-0.Dll is
used for both grep.exe and pcre2grep.exe, as shown below with an "=>"
annotation added by me to direct you to the Dll in question:

$ ldd grep.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
        KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
(0x7ffa87700000)
        KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
(0x7ffa85570000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
        cygintl-8.dll => /usr/bin/cygintl-8.dll (0x5ee2d0000)
=>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
        cygiconv-2.dll => /usr/bin/cygiconv-2.dll (0x3dff10000)

$ ldd pcre2grep.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
        KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
(0x7ffa87700000)
        KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
(0x7ffa85570000)
=>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
        cygbz2-1.dll => /usr/bin/cygbz2-1.dll (0x3ed560000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
        cygz.dll => /usr/bin/cygz.dll (0x5ebb10000)

# For what it's worth, I also checked into what versions of libintl8 and
libiconv-2 I have, and these are as follows:
# libintl8 0.22.4-1
# libiconv2 1.17-1

# And as an addition cross-check, I will include the following "complete
hack":
$ strings cygintl-8.dll | pcre2grep '^\d\.\d\d'
0.22.4
0.22.4

$ strings cygiconv-2.dll | pcre2grep '^\d\.\d\d'
1.17
1.17

# For completeness, here is my CYGWIN environment variable setting and some
other info:
$ echo "$CYGWIN"
glob:ignorecase winsymlinks:native pipe_byte
$ echo "$CYGWIN64_DIR"
c:\cygwin64
$ which grep
/usr/bin/grep
$ which pcre2grep
/usr/bin/pcre2grep
# No aliases are set up for these, either
$ alias grep pcre2grep
bash: alias: grep: not found
bash: alias: pcre2grep: not found
======================
Further comments:
I do not know with which version of grep.exe this misbehavior (or at least
misaligned behavior with respect to grep2pcre) of the '-P' switch began. I
discovered it after updating my Cygwin64 install to use the latest grep
version, which likely also picked up the latest version of PCRE2 and
other dependencies along the way.

Thank you for looking into this and/or providing constructive comments on
the source of the issue,

Michael Goldshteyn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: The grep 3.11 application when used in perl-regexp mode appears to now be broken
  2024-03-16 18:00 The grep 3.11 application when used in perl-regexp mode appears to now be broken Michael Goldshteyn
@ 2024-03-16 19:08 ` Kevin Schnitzius
  2024-03-16 20:40   ` Brian Inglis
  0 siblings, 1 reply; 3+ messages in thread
From: Kevin Schnitzius @ 2024-03-16 19:08 UTC (permalink / raw)
  To: cygwin


On Saturday, March 16, 2024 at 02:02:31 PM EDT, Michael Goldshteyn via Cygwin <cygwin@cygwin.com> wrote:

> $ grep -c -P '000$' a
> 0

> # Now you may be thinking, OK, it's because of the CR/LF line ending

$ LC_ALL=en_US grep -c --binary-files=text -P '000$' a
0
$ LC_ALL=en_US grep -c --binary-files=text -P '000\r$' a
1

It is the an EOL issue; it is also a bug.  

"By default, under MS-DOS and MS-Windows, grep guesses
whether a file is text or binary as described for the  --binary-files  option.   If
grep decides the file is a text file, it strips the CR characters from the original
file  contents  (to  make  regular  expressions  with  ^  and  $  work  correctly)."

The current release is not stripping EOL characters correctly in the case of DOS text files.

Kevin






On Saturday, March 16, 2024 at 02:02:31 PM EDT, Michael Goldshteyn via Cygwin <cygwin@cygwin.com> wrote: 





I just updated my Cygwin64 installation, which includes the grep
utility and its behavior has changed. It no longer works like it used to
for Perl reg-ex matching, as demonstrated below:

Simple test cases:
======================
$ ls -l a
-rwxr-xr-x 1 Michael None 6 Mar 16 12:15 a

$ hexdump -C a
00000000  31 30 30 30 0d 0a                                |1000..|
00000006

# Notice the CR/LF encoding after the "1000" text, as is the case for DOS
text files

# Now let's test grep regular match
$ grep --version
grep (GNU grep) 3.11
Packaged by Cygwin (3.11-1)
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.43 2024-02-16

$ grep '000' a
1000

# Match using pcre2
$ grep -P '000' a
1000

# OK, so far so good
$ grep -P '000$' a
# No match

# Put another way
$ grep -c -P '000$' a
0

# Now you may be thinking, OK, it's because of the CR/LF line ending
# But, I present the following
$ pcre2grep --version
pcre2grep version 10.43 2024-02-16

$ pcre2grep '000$' a
1000

# As a further cross-check, the same version of the cygpcre2-8-0.Dll is
used for both grep.exe and pcre2grep.exe, as shown below with an "=>"
annotation added by me to direct you to the Dll in question:

$ ldd grep.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
        KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
(0x7ffa87700000)
        KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
(0x7ffa85570000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
        cygintl-8.dll => /usr/bin/cygintl-8.dll (0x5ee2d0000)
=>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
        cygiconv-2.dll => /usr/bin/cygiconv-2.dll (0x3dff10000)

$ ldd pcre2grep.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
        KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
(0x7ffa87700000)
        KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
(0x7ffa85570000)
=>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
        cygbz2-1.dll => /usr/bin/cygbz2-1.dll (0x3ed560000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
        cygz.dll => /usr/bin/cygz.dll (0x5ebb10000)

# For what it's worth, I also checked into what versions of libintl8 and
libiconv-2 I have, and these are as follows:
# libintl8 0.22.4-1
# libiconv2 1.17-1

# And as an addition cross-check, I will include the following "complete
hack":
$ strings cygintl-8.dll | pcre2grep '^\d\.\d\d'
0.22.4
0.22.4

$ strings cygiconv-2.dll | pcre2grep '^\d\.\d\d'
1.17
1.17

# For completeness, here is my CYGWIN environment variable setting and some
other info:
$ echo "$CYGWIN"
glob:ignorecase winsymlinks:native pipe_byte
$ echo "$CYGWIN64_DIR"
c:\cygwin64
$ which grep
/usr/bin/grep
$ which pcre2grep
/usr/bin/pcre2grep
# No aliases are set up for these, either
$ alias grep pcre2grep
bash: alias: grep: not found
bash: alias: pcre2grep: not found
======================
Further comments:
I do not know with which version of grep.exe this misbehavior (or at least
misaligned behavior with respect to grep2pcre) of the '-P' switch began. I
discovered it after updating my Cygwin64 install to use the latest grep
version, which likely also picked up the latest version of PCRE2 and
other dependencies along the way.

Thank you for looking into this and/or providing constructive comments on
the source of the issue,

Michael Goldshteyn

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:    https://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: The grep 3.11 application when used in perl-regexp mode appears to now be broken
  2024-03-16 19:08 ` Kevin Schnitzius
@ 2024-03-16 20:40   ` Brian Inglis
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Inglis @ 2024-03-16 20:40 UTC (permalink / raw)
  To: cygwin

On 2024-03-16 13:08, Kevin Schnitzius via Cygwin wrote:
> On Saturday, March 16, 2024 at 02:02:31 PM EDT, Michael Goldshteyn via Cygwin wrote:
>> $ grep -c -P '000$' a
>> 0
> 
>> # Now you may be thinking, OK, it's because of the CR/LF line ending
> 
> $ LC_ALL=en_US grep -c --binary-files=text -P '000$' a
> 0
> $ LC_ALL=en_US grep -c --binary-files=text -P '000\r$' a
> 1
> 
> It is the an EOL issue; it is also a bug.
> 
> "By default, under MS-DOS and MS-Windows, grep guesses
> whether a file is text or binary as described for the  --binary-files  option.   If
> grep decides the file is a text file, it strips the CR characters from the original
> file  contents  (to  make  regular  expressions  with  ^  and  $  work  correctly)."
> 
> The current release is not stripping EOL characters correctly in the case of DOS text files.

> On Saturday, March 16, 2024 at 02:02:31 PM EDT, Michael Goldshteyn via Cygwin wrote:
> I just updated my Cygwin64 installation, which includes the grep
> utility and its behavior has changed. It no longer works like it used to
> for Perl reg-ex matching, as demonstrated below:
> 
> Simple test cases:
> ======================
> $ ls -l a
> -rwxr-xr-x 1 Michael None 6 Mar 16 12:15 a
> 
> $ hexdump -C a
> 00000000  31 30 30 30 0d 0a                                |1000..|
> 00000006
> 
> # Notice the CR/LF encoding after the "1000" text, as is the case for DOS
> text files
> 
> # Now let's test grep regular match
> $ grep --version
> grep (GNU grep) 3.11
> Packaged by Cygwin (3.11-1)
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> 
> Written by Mike Haertel and others; see
> <https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.
> 
> grep -P uses PCRE2 10.43 2024-02-16
> 
> $ grep '000' a
> 1000
> 
> # Match using pcre2
> $ grep -P '000' a
> 1000
> 
> # OK, so far so good
> $ grep -P '000$' a
> # No match
> 
> # Put another way
> $ grep -c -P '000$' a
> 0
> 
> # Now you may be thinking, OK, it's because of the CR/LF line ending
> # But, I present the following
> $ pcre2grep --version
> pcre2grep version 10.43 2024-02-16
> 
> $ pcre2grep '000$' a
> 1000
> 
> # As a further cross-check, the same version of the cygpcre2-8-0.Dll is
> used for both grep.exe and pcre2grep.exe, as shown below with an "=>"
> annotation added by me to direct you to the Dll in question:
> 
> $ ldd grep.exe
>          ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
>          KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
> (0x7ffa87700000)
>          KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
> (0x7ffa85570000)
>          cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
>          cygintl-8.dll => /usr/bin/cygintl-8.dll (0x5ee2d0000)
> =>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
>          cygiconv-2.dll => /usr/bin/cygiconv-2.dll (0x3dff10000)
> 
> $ ldd pcre2grep.exe
>          ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7ffa87d50000)
>          KERNEL32.DLL => /cygdrive/c/Windows/System32/KERNEL32.DLL
> (0x7ffa87700000)
>          KERNELBASE.dll => /cygdrive/c/Windows/System32/KERNELBASE.dll
> (0x7ffa85570000)
> =>        cygpcre2-8-0.dll => /usr/bin/cygpcre2-8-0.dll (0x5ec2b0000)
>          cygbz2-1.dll => /usr/bin/cygbz2-1.dll (0x3ed560000)
>          cygwin1.dll => /usr/bin/cygwin1.dll (0x7ff9c84d0000)
>          cygz.dll => /usr/bin/cygz.dll (0x5ebb10000)
> 
> # For what it's worth, I also checked into what versions of libintl8 and
> libiconv-2 I have, and these are as follows:
> # libintl8 0.22.4-1
> # libiconv2 1.17-1
> 
> # And as an addition cross-check, I will include the following "complete
> hack":
> $ strings cygintl-8.dll | pcre2grep '^\d\.\d\d'
> 0.22.4
> 0.22.4
> 
> $ strings cygiconv-2.dll | pcre2grep '^\d\.\d\d'
> 1.17
> 1.17
> 
> # For completeness, here is my CYGWIN environment variable setting and some
> other info:
> $ echo "$CYGWIN"
> glob:ignorecase winsymlinks:native pipe_byte
> $ echo "$CYGWIN64_DIR"
> c:\cygwin64
> $ which grep
> /usr/bin/grep
> $ which pcre2grep
> /usr/bin/pcre2grep
> # No aliases are set up for these, either
> $ alias grep pcre2grep
> bash: alias: grep: not found
> bash: alias: pcre2grep: not found
> ======================
> Further comments:
> I do not know with which version of grep.exe this misbehavior (or at least
> misaligned behavior with respect to grep2pcre) of the '-P' switch began. I
> discovered it after updating my Cygwin64 install to use the latest grep
> version, which likely also picked up the latest version of PCRE2 and
> other dependencies along the way.
> 
> Thank you for looking into this and/or providing constructive comments on
> the source of the issue,

You must have updated from very old packages, as this has been the case since 
2017-02 when Cygwin 2.7 was released, with gawk 4.1.4-3, grep 3.0-2, and sed 4.4 
operating strictly according to POSIX:

https://cygwin.com/pipermail/cygwin-announce/2017-February/007795.html
https://cygwin.com/pipermail/cygwin-announce/2017-February/007796.html
https://cygwin.com/pipermail/cygwin-announce/2017-February/007797.html

Similar changes may have occurred in some coreutils around that time.

Since then, all of us with current releases have been using Cygwin text mounts, 
d2u/dos2unix, BRE '\r\?$' ERE '\r?$', adding gawk 'sub(/\r$/,"")', sed 
's/\r$//', to strip CRs before LFs at EoL, if necessary.

Please note that Cygwin does not consider its releases "Windows" versions but 
POSIX versions.

It appears that pcre2 and thus pcre2grep may default line terminators to 
"anycrlf" to work with files from any environment, and is not standardized by 
POSIX as of 202X.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-16 20:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-16 18:00 The grep 3.11 application when used in perl-regexp mode appears to now be broken Michael Goldshteyn
2024-03-16 19:08 ` Kevin Schnitzius
2024-03-16 20:40   ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).