public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* --line-regexp option with null data
@ 2015-07-17  1:29 Steven Penny
  2015-07-17  2:04 ` John Hein
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Penny @ 2015-07-17  1:29 UTC (permalink / raw)
  To: cygwin

Consider this command:

    printf 'alpha\nbravo\ncharlie\n' | grep --line-regexp --quiet bravo

grep sees 3 lines separated by newline, and matches the bravo line. Now consider
this command:

    printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo

My thinking tells me that because I have not used `--null-data`, grep should see
1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
newline. However it does not, it succeeds just like the first command, why is
this?

Note I also tried this on Debian with Grep 2.2 and it works as expected.

http://stackoverflow.com/q/31467045

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  1:29 --line-regexp option with null data Steven Penny
@ 2015-07-17  2:04 ` John Hein
  2015-07-17  2:35   ` Andrey Repin
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: John Hein @ 2015-07-17  2:04 UTC (permalink / raw)
  To: cygwin

Steven Penny svnpenn-at-gmail.com |cygwin_ml_nodigest| wrote at 20:29 -0500 on Jul 16, 2015:
 > Consider this command:
 >
 >     printf 'alpha\nbravo\ncharlie\n' | grep --line-regexp --quiet bravo
 >
 > grep sees 3 lines separated by newline, and matches the bravo line. Now consider
 > this command:
 >
 >     printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo
 >
 > My thinking tells me that because I have not used `--null-data`, grep should see
 > 1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
 > newline. However it does not, it succeeds just like the first command, why is
 > this?
 >
 > Note I also tried this on Debian with Grep 2.2 and it works as expected.
 >
 > http://stackoverflow.com/q/31467045

cygwin grep is detecting the input as binary which seems to be
overriding the 'match the whole line' behavior of --line-regexp.  Get
rid of --quiet to see that.

That does seem like a bug in the cygwin implementation of grep to me.

As a workaround for this simple example, you can add -a (aka --text)
to force it to treat the input as text.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  2:04 ` John Hein
@ 2015-07-17  2:35   ` Andrey Repin
  2015-07-17  3:04     ` Steven Penny
  2015-07-17  3:44   ` Steven Penny
  2015-07-17 11:55   ` Eric Blake
  2 siblings, 1 reply; 7+ messages in thread
From: Andrey Repin @ 2015-07-17  2:35 UTC (permalink / raw)
  To: John Hein, cygwin

Greetings, John Hein!

> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp.  Get
> rid of --quiet to see that.

> That does seem like a bug in the cygwin implementation of grep to me.

Linux grep will do the same.
null byte = not a text.
Wrong encoding, not matching locale = not a text.

This is an upstream decision. It is arguable consistent… on Linux.
On Windows, of course, this is not the case.

> As a workaround for this simple example, you can add -a (aka --text)
> to force it to treat the input as text.


-- 
With best regards,
Andrey Repin
Friday, July 17, 2015 05:29:11

Sorry for my terrible english...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  2:35   ` Andrey Repin
@ 2015-07-17  3:04     ` Steven Penny
  2015-07-17  3:26       ` Vince Rice
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Penny @ 2015-07-17  3:04 UTC (permalink / raw)
  To: cygwin

On Thu, Jul 16, 2015 at 9:30 PM, Andrey Repin wrote:
> Linux grep will do the same.
> null byte = not a text.
> Wrong encoding, not matching locale = not a text.

I have repeatedly asked you to stay out of my threads. My experience is you
typically misread, misunderstand or misrepresent most or all of the threads you
comment on. I will ask again, please stop.

As I have said, I have already tested this today on Debian Linux 8, With
Grep 2.2 and it works as expected, and differently from the Cygwin version as I
have demonstrated.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  3:04     ` Steven Penny
@ 2015-07-17  3:26       ` Vince Rice
  0 siblings, 0 replies; 7+ messages in thread
From: Vince Rice @ 2015-07-17  3:26 UTC (permalink / raw)
  To: Cygwin Mailing List

> On Jul 16, 2015, at 10:04 PM, Steven Penny <svnpenn@gmail.com> wrote:
> 
> On Thu, Jul 16, 2015 at 9:30 PM, Andrey Repin wrote:
>> Linux grep will do the same.
>> null byte = not a text.
>> Wrong encoding, not matching locale = not a text.
> 
> I have repeatedly asked you to stay out of my threads. My experience is you
> typically misread, misunderstand or misrepresent most or all of the threads you
> comment on. I will ask again, please stop.
> …

And you have been just as repeatedly told, by the people in charge of this list, that’s not your call, and to stop speaking in such a manner to other people on this list.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  2:04 ` John Hein
  2015-07-17  2:35   ` Andrey Repin
@ 2015-07-17  3:44   ` Steven Penny
  2015-07-17 11:55   ` Eric Blake
  2 siblings, 0 replies; 7+ messages in thread
From: Steven Penny @ 2015-07-17  3:44 UTC (permalink / raw)
  To: cygwin

On Thu, Jul 16, 2015 at 9:04 PM, John Hein wrote:
> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp.  Get
> rid of --quiet to see that.

It appears to be intended behavior starting with version 2.21:

> If a file contains data improperly encoded for the current locale,
> and this is discovered before any of the file's contents are output,
> grep now treats the file as binary.

http://savannah.gnu.org/forum/forum.php?forum_id=8152

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --line-regexp option with null data
  2015-07-17  2:04 ` John Hein
  2015-07-17  2:35   ` Andrey Repin
  2015-07-17  3:44   ` Steven Penny
@ 2015-07-17 11:55   ` Eric Blake
  2 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2015-07-17 11:55 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On 07/16/2015 08:04 PM, John Hein wrote:

>  >     printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo
>  >
>  > My thinking tells me that because I have not used `--null-data`, grep should see
>  > 1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
>  > newline. However it does not, it succeeds just like the first command, why is
>  > this?
>  >
>  > Note I also tried this on Debian with Grep 2.2 and it works as expected.
>  >
>  > http://stackoverflow.com/q/31467045
> 
> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp.  Get
> rid of --quiet to see that.

The behavior on Linux is the same.  See the NEWS for grep 2.21:

  When searching binary data, grep now may treat non-text bytes as
  line terminators.  This can boost performance significantly.

> 
> That does seem like a bug in the cygwin implementation of grep to me.

No, it is intentional upstream behavior.

> 
> As a workaround for this simple example, you can add -a (aka --text)
> to force it to treat the input as text.

Yes, that IS the correct solution.  You must TELL grep to not treat \0
as a line terminator.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-07-17 11:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17  1:29 --line-regexp option with null data Steven Penny
2015-07-17  2:04 ` John Hein
2015-07-17  2:35   ` Andrey Repin
2015-07-17  3:04     ` Steven Penny
2015-07-17  3:26       ` Vince Rice
2015-07-17  3:44   ` Steven Penny
2015-07-17 11:55   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).