* --line-regexp option with null data
@ 2015-07-17 1:29 Steven Penny
2015-07-17 2:04 ` John Hein
0 siblings, 1 reply; 7+ messages in thread
From: Steven Penny @ 2015-07-17 1:29 UTC (permalink / raw)
To: cygwin
Consider this command:
printf 'alpha\nbravo\ncharlie\n' | grep --line-regexp --quiet bravo
grep sees 3 lines separated by newline, and matches the bravo line. Now consider
this command:
printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo
My thinking tells me that because I have not used `--null-data`, grep should see
1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
newline. However it does not, it succeeds just like the first command, why is
this?
Note I also tried this on Debian with Grep 2.2 and it works as expected.
http://stackoverflow.com/q/31467045
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 1:29 --line-regexp option with null data Steven Penny
@ 2015-07-17 2:04 ` John Hein
2015-07-17 2:35 ` Andrey Repin
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: John Hein @ 2015-07-17 2:04 UTC (permalink / raw)
To: cygwin
Steven Penny svnpenn-at-gmail.com |cygwin_ml_nodigest| wrote at 20:29 -0500 on Jul 16, 2015:
> Consider this command:
>
> printf 'alpha\nbravo\ncharlie\n' | grep --line-regexp --quiet bravo
>
> grep sees 3 lines separated by newline, and matches the bravo line. Now consider
> this command:
>
> printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo
>
> My thinking tells me that because I have not used `--null-data`, grep should see
> 1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
> newline. However it does not, it succeeds just like the first command, why is
> this?
>
> Note I also tried this on Debian with Grep 2.2 and it works as expected.
>
> http://stackoverflow.com/q/31467045
cygwin grep is detecting the input as binary which seems to be
overriding the 'match the whole line' behavior of --line-regexp. Get
rid of --quiet to see that.
That does seem like a bug in the cygwin implementation of grep to me.
As a workaround for this simple example, you can add -a (aka --text)
to force it to treat the input as text.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 2:04 ` John Hein
@ 2015-07-17 2:35 ` Andrey Repin
2015-07-17 3:04 ` Steven Penny
2015-07-17 3:44 ` Steven Penny
2015-07-17 11:55 ` Eric Blake
2 siblings, 1 reply; 7+ messages in thread
From: Andrey Repin @ 2015-07-17 2:35 UTC (permalink / raw)
To: John Hein, cygwin
Greetings, John Hein!
> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp. Get
> rid of --quiet to see that.
> That does seem like a bug in the cygwin implementation of grep to me.
Linux grep will do the same.
null byte = not a text.
Wrong encoding, not matching locale = not a text.
This is an upstream decision. It is arguable consistent… on Linux.
On Windows, of course, this is not the case.
> As a workaround for this simple example, you can add -a (aka --text)
> to force it to treat the input as text.
--
With best regards,
Andrey Repin
Friday, July 17, 2015 05:29:11
Sorry for my terrible english...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 2:35 ` Andrey Repin
@ 2015-07-17 3:04 ` Steven Penny
2015-07-17 3:26 ` Vince Rice
0 siblings, 1 reply; 7+ messages in thread
From: Steven Penny @ 2015-07-17 3:04 UTC (permalink / raw)
To: cygwin
On Thu, Jul 16, 2015 at 9:30 PM, Andrey Repin wrote:
> Linux grep will do the same.
> null byte = not a text.
> Wrong encoding, not matching locale = not a text.
I have repeatedly asked you to stay out of my threads. My experience is you
typically misread, misunderstand or misrepresent most or all of the threads you
comment on. I will ask again, please stop.
As I have said, I have already tested this today on Debian Linux 8, With
Grep 2.2 and it works as expected, and differently from the Cygwin version as I
have demonstrated.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 3:04 ` Steven Penny
@ 2015-07-17 3:26 ` Vince Rice
0 siblings, 0 replies; 7+ messages in thread
From: Vince Rice @ 2015-07-17 3:26 UTC (permalink / raw)
To: Cygwin Mailing List
> On Jul 16, 2015, at 10:04 PM, Steven Penny <svnpenn@gmail.com> wrote:
>
> On Thu, Jul 16, 2015 at 9:30 PM, Andrey Repin wrote:
>> Linux grep will do the same.
>> null byte = not a text.
>> Wrong encoding, not matching locale = not a text.
>
> I have repeatedly asked you to stay out of my threads. My experience is you
> typically misread, misunderstand or misrepresent most or all of the threads you
> comment on. I will ask again, please stop.
> …
And you have been just as repeatedly told, by the people in charge of this list, that’s not your call, and to stop speaking in such a manner to other people on this list.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 2:04 ` John Hein
2015-07-17 2:35 ` Andrey Repin
@ 2015-07-17 3:44 ` Steven Penny
2015-07-17 11:55 ` Eric Blake
2 siblings, 0 replies; 7+ messages in thread
From: Steven Penny @ 2015-07-17 3:44 UTC (permalink / raw)
To: cygwin
On Thu, Jul 16, 2015 at 9:04 PM, John Hein wrote:
> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp. Get
> rid of --quiet to see that.
It appears to be intended behavior starting with version 2.21:
> If a file contains data improperly encoded for the current locale,
> and this is discovered before any of the file's contents are output,
> grep now treats the file as binary.
http://savannah.gnu.org/forum/forum.php?forum_id=8152
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: --line-regexp option with null data
2015-07-17 2:04 ` John Hein
2015-07-17 2:35 ` Andrey Repin
2015-07-17 3:44 ` Steven Penny
@ 2015-07-17 11:55 ` Eric Blake
2 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2015-07-17 11:55 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]
On 07/16/2015 08:04 PM, John Hein wrote:
> > printf 'alpha\0bravo\0charlie\0' | grep --line-regexp --quiet bravo
> >
> > My thinking tells me that because I have not used `--null-data`, grep should see
> > 1 or even 0 lines separated by newline, and fail to match a `bravo` followed by
> > newline. However it does not, it succeeds just like the first command, why is
> > this?
> >
> > Note I also tried this on Debian with Grep 2.2 and it works as expected.
> >
> > http://stackoverflow.com/q/31467045
>
> cygwin grep is detecting the input as binary which seems to be
> overriding the 'match the whole line' behavior of --line-regexp. Get
> rid of --quiet to see that.
The behavior on Linux is the same. See the NEWS for grep 2.21:
When searching binary data, grep now may treat non-text bytes as
line terminators. This can boost performance significantly.
>
> That does seem like a bug in the cygwin implementation of grep to me.
No, it is intentional upstream behavior.
>
> As a workaround for this simple example, you can add -a (aka --text)
> to force it to treat the input as text.
Yes, that IS the correct solution. You must TELL grep to not treat \0
as a line terminator.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-07-17 11:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 1:29 --line-regexp option with null data Steven Penny
2015-07-17 2:04 ` John Hein
2015-07-17 2:35 ` Andrey Repin
2015-07-17 3:04 ` Steven Penny
2015-07-17 3:26 ` Vince Rice
2015-07-17 3:44 ` Steven Penny
2015-07-17 11:55 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).