public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* [ANNOUNCEMENT] Updated [test]: sed-4.4-1
@ 2017-02-11 17:20 Eric Blake (cygwin)
  2017-02-11 23:01 ` Steven Penny
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Blake (cygwin) @ 2017-02-11 17:20 UTC (permalink / raw)
  To: cygwin

A new experimental release of sed, 4.4-1, has been uploaded and will
soon reach a mirror near you; leaving the current version at 3.2.2-3.

NEWS:
=====
This is a new upstream release, packaged by a new maintainer.  The
release will be marked experimental for a few days, first because it was
built against the not-yet-released cygwin 2.7,0, and second because I
made a tweak that no longer automatically strips carriage returns from
input on binary mounts (things on text mode mounts should remain
unchanged).  Please speak up if this breaks your usage.  For more
details on sed, see the documentation in /usr/share/doc/sed/.

DESCRIPTION:
============
The GNU Sed (Stream EDitor) editor is a stream or batch
(non-interactive) editor.  Sed takes text as input, performs an
operation or set of operations on the text, and outputs the modified
text.  The operations that sed performs (substitutions, deletions,
insertions, etc.) can be specified in a script file or from the command
line.

UPDATE:
=======
To update your installation, click on the "Install Cygwin now" link on
the http://cygwin.com/ web page.  This downloads setup.exe to your
system. Save it and run setup, answer the questions and pick up 'sed'
in the 'Base' category (it should already be selected).

DOWNLOAD:
=========
Note that downloads from cygwin.com aren't allowed due to bandwidth
limitations.  This means that you will need to find a mirror which has
this update, please choose the one nearest to you:
http://cygwin.com/mirrors.html

QUESTIONS:
==========
If you want to make a point or ask a question the Cygwin mailing list is
the appropriate place.

-- 
Eric Blake
volunteer cygwin sed package maintainer

For more details on this list (including unsubscription), see:
http://sourceware.org/lists.html

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-11 17:20 [ANNOUNCEMENT] Updated [test]: sed-4.4-1 Eric Blake (cygwin)
@ 2017-02-11 23:01 ` Steven Penny
  2017-02-12 11:32   ` Corinna Vinschen
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Penny @ 2017-02-11 23:01 UTC (permalink / raw)
  To: cygwin

On Sat, 11 Feb 2017 11:06:17, "Eric Blake (cygwin)" wrote:
> I made a tweak that no longer automatically strips carriage returns from
> input on binary mounts

This is great, but can we do it for Awk too?

    $ printf 'hello world\r\n' | awk 1 | od -tcx1
    0000000   h   e   l   l   o       w   o   r   l   d  \n
             68  65  6c  6c  6f  20  77  6f  72  6c  64  0a

Currently you have to make this awful incantation:

    $ unset POSIXLY_CORRECT
    $ printf 'hello world\r\n' | awk -vBINMODE=1 1 | od -tcx1
    0000000   h   e   l   l   o       w   o   r   l   d  \r  \n
             68  65  6c  6c  6f  20  77  6f  72  6c  64  0d  0a

BINMODE only gets parsed on the command line; it is not recognized even in the
BEGIN section. This makes it impossible to write portable Awk scripts with
respect to carriage returns.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-11 23:01 ` Steven Penny
@ 2017-02-12 11:32   ` Corinna Vinschen
  2017-02-12 15:13     ` Steven Penny
  2017-02-13 19:15     ` Eric Blake
  0 siblings, 2 replies; 9+ messages in thread
From: Corinna Vinschen @ 2017-02-12 11:32 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On Feb 11 15:01, Steven Penny wrote:
> On Sat, 11 Feb 2017 11:06:17, "Eric Blake (cygwin)" wrote:
> > I made a tweak that no longer automatically strips carriage returns from
> > input on binary mounts
> 
> This is great, but can we do it for Awk too?
> 
>     $ printf 'hello world\r\n' | awk 1 | od -tcx1
>     0000000   h   e   l   l   o       w   o   r   l   d  \n
>              68  65  6c  6c  6f  20  77  6f  72  6c  64  0a
> 
> Currently you have to make this awful incantation:
> 
>     $ unset POSIXLY_CORRECT
>     $ printf 'hello world\r\n' | awk -vBINMODE=1 1 | od -tcx1
>     0000000   h   e   l   l   o       w   o   r   l   d  \r  \n
>              68  65  6c  6c  6f  20  77  6f  72  6c  64  0d  0a
> 
> BINMODE only gets parsed on the command line; it is not recognized even in the
> BEGIN section. This makes it impossible to write portable Awk scripts with
> respect to carriage returns.

I understand the desire but it's s a pretty tricky problem.  awk is
used to manipulate text input in the first place so it treats all
input, files as well as stdin, as text.  So, shall we drop this
behaviour for files only?  Or for stdin as well?  How many existing
setups are bound to fail after a change?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-12 11:32   ` Corinna Vinschen
@ 2017-02-12 15:13     ` Steven Penny
  2017-02-13 14:15       ` Nellis, Kenneth (Conduent)
  2017-02-13 19:15     ` Eric Blake
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Penny @ 2017-02-12 15:13 UTC (permalink / raw)
  To: cygwin

On Sun, 12 Feb 2017 12:32:22, Corinna Vinschen wrote:
> awk is used to manipulate text input in the first place so it treats all
> input, files as well as stdin, as text.  So, shall we drop this
> behaviour for files only?  Or for stdin as well?  How many existing
> setups are bound to fail after a change?

Perhaps I am missing something, but cant all that be said about Sed too? I just
cant see a situation where we are justified changing one and not the other. They
should either both strip carriage returns or neither.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-12 15:13     ` Steven Penny
@ 2017-02-13 14:15       ` Nellis, Kenneth (Conduent)
  2017-02-13 15:53         ` cyg Simple
  0 siblings, 1 reply; 9+ messages in thread
From: Nellis, Kenneth (Conduent) @ 2017-02-13 14:15 UTC (permalink / raw)
  To: cygwin

From: Steven Penny  
> Perhaps I am missing something, but cant all that be said about Sed too? I
> just cant see a situation where we are justified changing one and not the
> other. They should either both strip carriage returns or neither.

How about grep?

$ printf 'hello\r\nworld\r\n' | grep hello | od -An -tcx1
   h   e   l   l   o  \n
  68  65  6c  6c  6f  0a
$

Are there others?

(BTW, I support the change.)

--Ken Nellis

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-13 14:15       ` Nellis, Kenneth (Conduent)
@ 2017-02-13 15:53         ` cyg Simple
  2017-02-13 19:07           ` Eric Blake
  0 siblings, 1 reply; 9+ messages in thread
From: cyg Simple @ 2017-02-13 15:53 UTC (permalink / raw)
  To: cygwin

On 2/13/2017 9:14 AM, Nellis, Kenneth (Conduent) wrote:
> From: Steven Penny  
>> Perhaps I am missing something, but cant all that be said about Sed too? I
>> just cant see a situation where we are justified changing one and not the
>> other. They should either both strip carriage returns or neither.
> 
> How about grep?
> 
> $ printf 'hello\r\nworld\r\n' | grep hello | od -An -tcx1
>    h   e   l   l   o  \n
>   68  65  6c  6c  6f  0a
> $
> 
> Are there others?
> 
> (BTW, I support the change.)
> 

All pipe handles should be binary or at least an option to make it that
way.  The file handles should be bound to the mounted mode.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-13 15:53         ` cyg Simple
@ 2017-02-13 19:07           ` Eric Blake
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2017-02-13 19:07 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 2094 bytes --]

On 02/13/2017 09:53 AM, cyg Simple wrote:
> On 2/13/2017 9:14 AM, Nellis, Kenneth (Conduent) wrote:
>> From: Steven Penny  
>>> Perhaps I am missing something, but cant all that be said about Sed too? I
>>> just cant see a situation where we are justified changing one and not the
>>> other. They should either both strip carriage returns or neither.
>>
>> How about grep?
>>
>> $ printf 'hello\r\nworld\r\n' | grep hello | od -An -tcx1
>>    h   e   l   l   o  \n
>>   68  65  6c  6c  6f  0a
>> $
>>
>> Are there others?
>>
>> (BTW, I support the change.)
>>
> 
> All pipe handles should be binary or at least an option to make it that
> way.  The file handles should be bound to the mounted mode.

I'm in favor of reducing special cases of FORCED text mode. It's great
on text mounts, but text mounts are discouraged for a reason (slower
computing, surprising results when seeking), and I recently patched bash
to quit forcing text mode (bash 4.3.42-4).

Pipes are indeed binary mode by default (and should stay that way), so
even if you have a long pipeline chain:

cmd1 < file_in | cmd2 | cmd3 | cmd4 > file_out

if file_in and file_out are mounted on text mounts, then cmd1 won't see
any carriage returns, so neither will cmd2, cmd3, or cmd4, and finally
cmd4 writes in text mode back to file_out.

But when you are operating on a binary mount, and WANT carriage returns
to be preserved, forcing a text mount at any point in the chain corrupts
all later points in the chain.

There's a big difference between using "rt" to force text mode (which is
what I killed in this sed release), using "rb" to force binary mode
(which is what I use in tar, because tar MUST preserve binary data), and
using "r" (which is what sed now uses) to let the mount point decide
whether CR are important.

So I'd be in favor of a patch to awk dropping forced text mode on binary
mounts.

And I'll look into fixing grep to quit misbehaving as well.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-12 11:32   ` Corinna Vinschen
  2017-02-12 15:13     ` Steven Penny
@ 2017-02-13 19:15     ` Eric Blake
  2017-02-16  5:05       ` Steven Penny
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Blake @ 2017-02-13 19:15 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 1881 bytes --]

On 02/12/2017 05:32 AM, Corinna Vinschen wrote:
> I understand the desire but it's s a pretty tricky problem.  awk is
> used to manipulate text input in the first place so it treats all
> input, files as well as stdin, as text.  So, shall we drop this
> behaviour for files only?  Or for stdin as well?  How many existing
> setups are bound to fail after a change?

I think part of the confusion is that POSIX states that awk behavior is
only well-defined on "text files" - but that is the POSIX definition of
a text file (no invalid characters in multibyte encoding, no over-long
lines, no NUL bytes, trailing newline), and not strictly related to the
Windows definition of text file (one with CRLF line endings).  But
remember, just because POSIX says that awk is only required to be
well-behaved on text files does not mean that awk cannot be usefully
used on non-text files, and anything we do that silently converts binary
data into corrupted text, when a binary mount was requested, gets in the
way of that usage pattern.

As long as we aren't using fopen("rb") to force binary mode, but rather
just fopen("r") to let the mount mode rule, we should be okay for any
file that we open.  As for stdin, ideally stdin is either from a file
(where the shell opened it according to mount mode) or from a pipeline
(where presumably the other end of the pipe opened the file in the
correct mount mode, or where the user can inject a d2u into the pipeline
if they want CR stripped).

Yes, it means that any existing users that were lazily relying on the
forced text mode to automatically strip CRs will now have to fix their
scripts to add a d2u invocation, but I already hit some of that fallout
when I changed bash to quit forcing text mode.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCEMENT] Updated [test]: sed-4.4-1
  2017-02-13 19:15     ` Eric Blake
@ 2017-02-16  5:05       ` Steven Penny
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Penny @ 2017-02-16  5:05 UTC (permalink / raw)
  To: cygwin

On Mon, 13 Feb 2017 13:15:18, Eric Blake wrote:
> Yes, it means that any existing users that were lazily relying on the
> forced text mode to automatically strip CRs will now have to fix their
> scripts to add a d2u invocation, but I already hit some of that fallout
> when I changed bash to quit forcing text mode.

Just to confirm, this is working well, thanks:

    $ printf 'hello world\r\n' | sed '' | od -tcx1
    0000000   h   e   l   l   o       w   o   r   l   d  \r  \n
             68  65  6c  6c  6f  20  77  6f  72  6c  64  0d  0a


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-16  5:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-11 17:20 [ANNOUNCEMENT] Updated [test]: sed-4.4-1 Eric Blake (cygwin)
2017-02-11 23:01 ` Steven Penny
2017-02-12 11:32   ` Corinna Vinschen
2017-02-12 15:13     ` Steven Penny
2017-02-13 14:15       ` Nellis, Kenneth (Conduent)
2017-02-13 15:53         ` cyg Simple
2017-02-13 19:07           ` Eric Blake
2017-02-13 19:15     ` Eric Blake
2017-02-16  5:05       ` Steven Penny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).