public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
@ 2017-06-07 16:23 Soegtrop, Michael
  2017-06-07 17:23 ` Eric Blake
  0 siblings, 1 reply; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-07 16:23 UTC (permalink / raw)
  To: cygwin

Dear Cygwin Team,

in the latest version of cygwin with sed-4.4-1.tar.bz2 the behavior of sed regarding handling CR-LF sequences changed. The last version I tried where this was still working is sed-4.2.2-3.tar.bz2.

I would say that the documented behavior in both versions is that they replace CR-LF with LF, because both version have a documented option -b to switch of this behavior. But in version 4.4-1 this doesn't work anymore. This breaks a lot of build scripts, because var=$( prog | sed .) now usually adds a carriage return to the variable in case the called prog is a MinGW program - a common scenario in MinGW cross development.

Is this considered a bug in sed 4.4-1 or is the old behavior and the -b option considered deprecated and it was just forgotten to remove the documentation for the -b option?

Best regards,

Michael

Old behavior (sed 4.2.2-3):

$ echo -e "a\r\nb\r\nc\r" | sed 's/a//' | hexdump -C
+ echo -e 'a\r\nb\r\nc\r'
+ sed s/a//
+ hexdump -C
00000000  0a 62 0a 63 0a                                    |.b.c.|
00000005

New behavior (sed 4.4-1):

$ echo -e "a\r\nb\r\nc\r" | sed 's/a//' | hexdump -C
+ echo -e 'a\r\nb\r\nc\r'
+ sed s/a//
+ hexdump -C
00000000  0d 0a 62 0d 0a 63 0d 0a                           |..b..c..|
00000008
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-07 16:23 CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts Soegtrop, Michael
@ 2017-06-07 17:23 ` Eric Blake
  2017-06-07 19:26   ` Brian Inglis
  2017-06-08  8:50   ` Soegtrop, Michael
  0 siblings, 2 replies; 37+ messages in thread
From: Eric Blake @ 2017-06-07 17:23 UTC (permalink / raw)
  To: cygwin, michael.soegtrop


[-- Attachment #1.1: Type: text/plain, Size: 1257 bytes --]

On 06/07/2017 11:23 AM, Soegtrop, Michael wrote:
> Dear Cygwin Team,
> 
> in the latest version of cygwin with sed-4.4-1.tar.bz2 the behavior of sed regarding handling CR-LF sequences changed.

And the change was documented (don't you read the release notes?)
https://cygwin.com/ml/cygwin-announce/2017-02/msg00036.html

> I would say that the documented behavior in both versions is that they replace CR-LF with LF,

No, the documented behavior is that CR-LF is converted to LF only for
text-mounted files; but pipelines are default binary-mounted.  If you
want to strip CR from a pipeline, then make it explicit.

> var=$( prog | sed .)

Rewrite that to var=$( prog | tr -d '\r' | sed .)

> Is this considered a bug in sed 4.4-1 or is the old behavior and the -b option considered deprecated and it was just forgotten to remove the documentation for the -b option?

The -b option still works (forcing binary mode when you otherwise have a
text mount); what changed was that the default behavior of pipelines is
now binary instead of text, as binary is a better default mode for Linux
compatibility.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-07 17:23 ` Eric Blake
@ 2017-06-07 19:26   ` Brian Inglis
  2017-06-08  8:50   ` Soegtrop, Michael
  1 sibling, 0 replies; 37+ messages in thread
From: Brian Inglis @ 2017-06-07 19:26 UTC (permalink / raw)
  To: cygwin

On 2017-06-07 11:23, Eric Blake wrote:
> On 06/07/2017 11:23 AM, Soegtrop, Michael wrote:
>> in the latest version of cygwin with sed-4.4-1.tar.bz2 the
>> behavior of sed regarding handling CR-LF sequences changed.
> And the change was documented (don't you read the release notes?)
> https://cygwin.com/ml/cygwin-announce/2017-02/msg00036.html
>> I would say that the documented behavior in both versions is that
>> they replace CR-LF with LF,
> No, the documented behavior is that CR-LF is converted to LF only
> for text-mounted files; but pipelines are default binary-mounted. If
> you want to strip CR from a pipeline, then make it explicit.
>> var=$( prog | sed .)
> Rewrite that to var=$( prog | tr -d '\r' | sed .)

For compatibility var=$( prog | sed -b -e 's/\r$//' ...)

>> Is this considered a bug in sed 4.4-1 or is the old behavior and
>> the -b option considered deprecated and it was just forgotten to
>> remove the documentation for the -b option?
> The -b option still works (forcing binary mode when you otherwise
> have a text mount); what changed was that the default behavior of
> pipelines is now binary instead of text, as binary is a better
> default mode for Linux compatibility.

Note that -b, --binary is not documented in any sed man page, only in
--help, and info pages (info -- sed -b), where it is explained as "This
option is available on every platform, but is only effective where the
operating system makes a distinction between text files and binary files".

On Cygwin that is now only files on text mounts; that also now applies
to many other Cygwin text utils: we have all had to make minor script
tweaks to deal with possible Windows text file input.

If you use sed -b -e 's/\r$//' where you currently use plain sed and
Windows text files may be input, it will work compatibly across platforms.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-07 17:23 ` Eric Blake
  2017-06-07 19:26   ` Brian Inglis
@ 2017-06-08  8:50   ` Soegtrop, Michael
  2017-06-08 13:31     ` Vince Rice
                       ` (3 more replies)
  1 sibling, 4 replies; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-08  8:50 UTC (permalink / raw)
  To: Eric Blake, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2945 bytes --]

Dear Eric,

> No, the documented behavior is that CR-LF is converted to LF only for text-
> mounted files; but pipelines are default binary-mounted.  If you want to strip
> CR from a pipeline, then make it explicit.
> 
> > var=$( prog | sed .)
> 
> Rewrite that to var=$( prog | tr -d '\r' | sed .)

I have two problems with this:

1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from sources and this breaks many of the configure and other scripts. Feeding back the fixes to the individual lib/tool maintainers will take quite some time and also results in lengthy discussion why they should care about crappy DOS artefacts at all. A compatibility option via environment variable would have been nice.

2.) It is very hard to interpret the documentation in this way. I am citing from https://www.gnu.org/software/sed/manual/sed.html:

-b --binary  
This option is available on every platform, but is only effective where the operating system makes a distinction between text files and binary files. When such a distinction is made—as is the case for MS-DOS, Windows, Cygwin—text files are composed of lines separated by a carriage return and a line feed character, and sed does not see the ending CR. When this option is specified, sed will open input files in binary mode, thus not requesting this special processing and considering lines to end at a line feed.

This doesn't say what is treated as a text file and what is treated as a binary file and one can reasonably assume that a text tool like sed opens everything not explicitly declared as binary as text, if a documented option like -b exists.

This cygwin sed behavior is documented in https://cygwin.com/cygwin-ug-net/using-textbinary.html but I wouldn't expect people using sed on cygwin will find this.

In summary I would say that the behavior of sed in cygwin is documented in the cygwin documentation, but it is contradicting the documentation of sed itself, and possibly the intended function of sed as a text processing tool.

I must admit that building Linux stuff for MinGW cross on cygwin works substantially better than doing this on MSys/MSys2. The number of patches I need is small, so the decisions the cygwin team took seem to be the right ones. But this change adds at least one order of magnitude in my "number of patches required" statistics. 

Best regards,

Michael

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\x03B‹KCB”\x1c›Ø›\x19[H\x1c™\^[ܝ\x1cΈ\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÜ\x1c›Ø›\x19[\Ëš\x1d^[[\x03B‘TNˆ\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ˜\KÃB‘^[ØÝ[Y[\x18]\x1a[ÛŽˆ\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ^[ØÜËš\x1d^[[\x03B•[œÝXœØÜšX™H\x1a[™›Îˆ\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÛ[\vÈÝ[œÝXœØÜšX™K\Ú[\^[\x19CBƒB

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08  8:50   ` Soegtrop, Michael
@ 2017-06-08 13:31     ` Vince Rice
  2017-06-08 14:52       ` Soegtrop, Michael
  2017-06-08 15:04       ` Eric Blake
  2017-06-08 15:08     ` Eric Blake
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 37+ messages in thread
From: Vince Rice @ 2017-06-08 13:31 UTC (permalink / raw)
  To: cygwin

> On Jun 8, 2017, at 3:50 AM, Soegtrop, Michael wrote:
> 
> Dear Eric,
> 
>> No, the documented behavior is that CR-LF is converted to LF only for text-
>> mounted files; but pipelines are default binary-mounted.  If you want to strip
>> CR from a pipeline, then make it explicit.
>> 
>>> var=$( prog | sed .)
>> 
>> Rewrite that to var=$( prog | tr -d '\r' | sed .)
> 
> I have two problems with this:
> 
> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from sources and this breaks many of the configure and other scripts. Feeding back the fixes to the individual lib/tool maintainers will take quite some time and also results in lengthy discussion why they should care about crappy DOS artefacts at all. A compatibility option via environment variable would have been nice.
> 
> 2.) It is very hard to interpret the documentation in this way. I am citing from https://www.gnu.org/software/sed/manual/sed.html:
> 
> -b --binary  
> This option is available on every platform, but is only effective where the operating system makes a distinction between text files and binary files. When such a distinction is made—as is the case for MS-DOS, Windows, Cygwin—text files are composed of lines separated by a carriage return and a line feed character, and sed does not see the ending CR. When this option is specified, sed will open input files in binary mode, thus not requesting this special processing and considering lines to end at a line feed.
> 
> This doesn't say what is treated as a text file and what is treated as a binary file and one can reasonably assume that a text tool like sed opens everything not explicitly declared as binary as text, if a documented option like -b exists.
> 
> This cygwin sed behavior is documented in https://cygwin.com/cygwin-ug-net/using-textbinary.html but I wouldn't expect people using sed on cygwin will find this.
> 
> In summary I would say that the behavior of sed in cygwin is documented in the cygwin documentation, but it is contradicting the documentation of sed itself, and possibly the intended function of sed as a text processing tool.
> 
> I must admit that building Linux stuff for MinGW cross on cygwin works substantially better than doing this on MSys/MSys2. The number of patches I need is small, so the decisions the cygwin team took seem to be the right ones. But this change adds at least one order of magnitude in my "number of patches required" statistics. 

Use binary mounts. The root of the problem is using text mounts in the first place.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08 13:31     ` Vince Rice
@ 2017-06-08 14:52       ` Soegtrop, Michael
  2017-06-08 15:04       ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-08 14:52 UTC (permalink / raw)
  To: Vince Rice, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1111 bytes --]

Dear Vince,

> Use binary mounts. The root of the problem is using text mounts in the first
> place.

How would this help to convince sed to do CR-LF to LF conversion, especially on piped input? 

Also I think all my mounts are binary:

$ mount
D:/bin/cygwin /bin on /usr/bin type ntfs (binary,auto)
D:/bin/cygwin /lib on /usr/lib type ntfs (binary,auto)
D:/bin/cygwin on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
D: on /cygdrive/d type ntfs (binary,posix=0,user,noumount,auto)

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\0ТÒÐÐ¥\a&ö&ÆVÒ\a&W\x06÷'G3¢\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒ÷\a&ö&ÆV×2æ‡FÖÀФd\x15\x13¢\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöf\x17\x12ðФFö7VÖVçF\x17F–öã¢\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöFö72æ‡FÖÀÐ¥Vç7V'67&–&R\x06–æfó¢\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöÖÂò7Vç7V'67&–&R×6–×\x06ÆPРÐ

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08 13:31     ` Vince Rice
  2017-06-08 14:52       ` Soegtrop, Michael
@ 2017-06-08 15:04       ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Eric Blake @ 2017-06-08 15:04 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 602 bytes --]

On 06/08/2017 08:31 AM, Vince Rice wrote:

> Use binary mounts. The root of the problem is using text mounts in the first place.

Huh? The OP _wants_ the CR-LF conversion to LF that happens on text
mounts (but does NOT happen on binary mounts).  Also, binary mounts are
the default, you have to go out of your way to explicitly enable text
mounts.  But enabling text mounts only helps for files in the file
system; it does not help pipes from windows programs.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08  8:50   ` Soegtrop, Michael
  2017-06-08 13:31     ` Vince Rice
@ 2017-06-08 15:08     ` Eric Blake
  2017-06-08 17:34       ` cyg Simple
  2017-06-08 18:51     ` L A Walsh
  2017-06-09 13:50     ` Brian Inglis
  3 siblings, 1 reply; 37+ messages in thread
From: Eric Blake @ 2017-06-08 15:08 UTC (permalink / raw)
  To: Soegtrop, Michael, cygwin


[-- Attachment #1.1: Type: text/plain, Size: 1280 bytes --]

On 06/08/2017 03:50 AM, Soegtrop, Michael wrote:

> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from sources and this breaks many of the configure and other scripts. Feeding back the fixes to the individual lib/tool maintainers will take quite some time and also results in lengthy discussion why they should care about crappy DOS artefacts at all. A compatibility option via environment variable would have been nice.

At one point in the distant past, we DID have an environment variable
($CYGWIN) which included an option to force all pipelines to behave as
text-mount rather than binary-mount.  I don't remember why it was ripped
out.  It may be worth a patch to cygwin1.dll to re-add some way to
enable text-mount pipelines - or maybe even limited-context text-mount
pipelines (if Cygwin already has an easy way to decide whether the
sending end of a pipeline is a cygwin or native windows process, then
the conditional decision would be that input from the pipeline is in
text mode if the sender is non-cygwin; while output to a pipeline is
always binary).  But I'm not in a position to write such a pach.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08 15:08     ` Eric Blake
@ 2017-06-08 17:34       ` cyg Simple
  0 siblings, 0 replies; 37+ messages in thread
From: cyg Simple @ 2017-06-08 17:34 UTC (permalink / raw)
  To: cygwin

On 6/8/2017 11:08 AM, Eric Blake wrote:
> On 06/08/2017 03:50 AM, Soegtrop, Michael wrote:
> 
>> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from sources and this breaks many of the configure and other scripts. Feeding back the fixes to the individual lib/tool maintainers will take quite some time and also results in lengthy discussion why they should care about crappy DOS artefacts at all. A compatibility option via environment variable would have been nice.
> 
> At one point in the distant past, we DID have an environment variable
> ($CYGWIN) which included an option to force all pipelines to behave as
> text-mount rather than binary-mount.  I don't remember why it was ripped
> out.  It may be worth a patch to cygwin1.dll to re-add some way to
> enable text-mount pipelines - or maybe even limited-context text-mount
> pipelines (if Cygwin already has an easy way to decide whether the
> sending end of a pipeline is a cygwin or native windows process, then
> the conditional decision would be that input from the pipeline is in
> text mode if the sender is non-cygwin; while output to a pipeline is
> always binary).  But I'm not in a position to write such a pach.
> 

I will add that the original MSYS, I haven't used MSYS2 so I don't know
about it, used binary mode pipes from the start.  There were many
discussions about this but the stance for binary pipes remained.  You
always want the piped data to contain the full set of data to allow for
the peek/seek functions to work properly otherwise the counts from the
read function are just wrong.

There are options to default the MinGW stdio to be in binary mode
instead of the I/O being CR\LF it would just be the standard LF if you
add that option object to the executable build.  The only time you
really need CR\LF is when dealing with dreadful Notepad and a few other
MS specific utilities.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08  8:50   ` Soegtrop, Michael
  2017-06-08 13:31     ` Vince Rice
  2017-06-08 15:08     ` Eric Blake
@ 2017-06-08 18:51     ` L A Walsh
  2017-06-08 19:57       ` Eric Blake
  2017-06-09 13:50     ` Brian Inglis
  3 siblings, 1 reply; 37+ messages in thread
From: L A Walsh @ 2017-06-08 18:51 UTC (permalink / raw)
  To: michael.soegtrop; +Cc: cygwin

Soegtrop, Michael wrote:
> Dear Eric,
>
>   
>> No, the documented behavior is that CR-LF is converted to LF only for text-
>> mounted files; but pipelines are default binary-mounted.  If you want to strip
>> CR from a pipeline, then make it explicit.
>>
>>     
>>> var=$( prog | sed .)
>>>       
>> Rewrite that to var=$( prog | tr -d '\r' | sed .)
>>     
>
> I have two problems with this:
>
> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from sources and this breaks many of the configure and other scripts. 
---
    But didn't one have to use 'sed -b' before, in order to
strip out CR's?  I.e. wouldn't all the individual lib/tool maintainers have
been required to add '-b' to their sed scripts?  Seems either way,
you have the undesirability of forcing external products to change to
support cygwin.

    Whereas, what I'd wonder is, how you are supplying input to sed
in the first place?  I.e. how did CR's get into the stream to begin with.
If you used cygwin and some tool on cygwin generated CR's into the output
stream, I'd think that'd be a problem (or bug).  But if you are mixing
DOS/Win-progs w/cygwin, then you need to adapt the DOS/Win progs' outputs to
not have CR in them.

Or am I missing something basic?




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08 18:51     ` L A Walsh
@ 2017-06-08 19:57       ` Eric Blake
  2017-06-09  8:14         ` Soegtrop, Michael
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Blake @ 2017-06-08 19:57 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --]

On 06/08/2017 01:51 PM, L A Walsh wrote:

>> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin from
>> sources and this breaks many of the configure and other scripts. 
> ---
>    But didn't one have to use 'sed -b' before, in order to
> strip out CR's?

No, the exact opposite.  It used to be that you HAD to use 'sed -b' to
preserve CRs on a binary mount; now binary mounts preserve CRs
automatically, making 'sed -b' a no-op on binary mounts.  (This is
closer to Linux behavior, where 'sed' preserves CRs automatically
because everything is binary mount, and 'sed -b' is a no-op).  On text
mounts, 'sed -b' allows you to preserve CRs where they would otherwise
be stripped automatically.

>  I.e. wouldn't all the individual lib/tool maintainers have
> been required to add '-b' to their sed scripts?

Sort of. The problem was that it used to be difficult to write portable
scripts that worked on Cygwin and non-cygwin and still dealt with CRs.
That's because you could not rely on 'sed -b' existing (not all the
world uses GNU sed, and POSIX doesn't require -b to exist).  But if you
omitted the -b on Cygwin, your data was silently corrupted.

With the change back in February, now Cygwin sed defaults to POSIX
behavior on binary mounts, and the ONLY people that still have to use
'sed -b' are those who use text mounts; while remembering that text
mounts are not the default.

>  Seems either way,
> you have the undesirability of forcing external products to change to
> support cygwin.

External products were being lazy by relying on cygwin to strip CR when
they should have stripped it themselves.  But 'sed -b' does NOT strip CR
(it is the exact opposite, of keeping CR unstripped).

> 
>    Whereas, what I'd wonder is, how you are supplying input to sed
> in the first place?  I.e. how did CR's get into the stream to begin with.
> If you used cygwin and some tool on cygwin generated CR's into the output
> stream, I'd think that'd be a problem (or bug).  But if you are mixing
> DOS/Win-progs w/cygwin, then you need to adapt the DOS/Win progs'
> outputs to
> not have CR in them.

Exactly - it used to be you could be lazy and feed the DOS/Win prog
output (with CRs) to cygwin, and cygwin would ignore the CR - but that
laziness came at a price that it would silently corrupt data for someone
that was not aware that they needed the non-portable 'sed -b' to
preserve CR when operating on known-binary data.  Yes, the change is
forcing clients of external data to be more explicit about the CR in
their data, but in my mind, that's a GOOD thing - it's always better to
be explicit about intentions, and the new behavior is something YOU
control by whether you pre-filter the data, and not something that sed
FORCED on you by using text mode against your wishes.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08 19:57       ` Eric Blake
@ 2017-06-09  8:14         ` Soegtrop, Michael
  2017-06-09 14:06           ` cyg Simple
  0 siblings, 1 reply; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09  8:14 UTC (permalink / raw)
  To: Eric Blake, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2807 bytes --]

Dear Blake,

> External products were being lazy by relying on cygwin to strip CR when they
> should have stripped it themselves.  But 'sed -b' does NOT strip CR (it is the
> exact opposite, of keeping CR unstripped).

I think that there are more people who use sed for text processing in a MinGW cygwin cross environment than there are people using sed for binary data - but looking at the mailing list this might be a subjective view. The maintainers of the Linux centric SW could insert dos2unix in all stuff piped into sed, but this would only be required in this very specific build configuration, and Linux SW maintainers frequently argue why they should care about Windows at all. It can take all sorts of philosophical discussions to get such a change in. And I cannot blame them for a "we are free SW people, if you Windowers can make use of our SW without bothering us - go ahead, but please don't drag us into your mud" attitude. In the end one usually gets it done, but the effort is not negligible.

I think that sed, grep and awk are in the end text processing tools, and that they should at least have an environment option to behave like text processing tools in a mixed cygwin MinGW environment. With sed I have several 100 issues building a single application, with grep it was just one in my scripts which I fixed. awk no issues so far.

Btw.: I don't think that it will be easy or even possible to build detection for MinGW generated output into the cygwin dll as you suggested in a previous post. How should the receiving part of a pipe know what kind of DLLs the sending part has loaded? One could have obscure heuristics to detect "text with cr-lf line endings", but this sounds more like a night mare than a solution. The only entity who could detect this is the shell, but then you have more than one shell (and more philosophers). As I said, I think on cygwin sed, grep and awk should have an environment option to be MinGW friendly text processors (as they used to be). Other less text centric SW should be unaffected.

Honestly my solution to the problem is to build sed from sources with CR stripping. I thought about it a day and came to the conclusion that everything else is a waste of time.

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\x03B‹KCB”\x1c›Ø›\x19[H\x1c™\^[ܝ\x1cΈ\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÜ\x1c›Ø›\x19[\Ëš\x1d^[[\x03B‘TNˆ\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ˜\KÃB‘^[ØÝ[Y[\x18]\x1a[ÛŽˆ\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ^[ØÜËš\x1d^[[\x03B•[œÝXœØÜšX™H\x1a[™›Îˆ\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÛ[\vÈÝ[œÝXœØÜšX™K\Ú[\^[\x19CBƒB

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-08  8:50   ` Soegtrop, Michael
                       ` (2 preceding siblings ...)
  2017-06-08 18:51     ` L A Walsh
@ 2017-06-09 13:50     ` Brian Inglis
  2017-06-09 15:05       ` Soegtrop, Michael
  3 siblings, 1 reply; 37+ messages in thread
From: Brian Inglis @ 2017-06-09 13:50 UTC (permalink / raw)
  To: cygwin

On 2017-06-08 02:50, Soegtrop, Michael wrote:
>> No, the documented behavior is that CR-LF is converted to LF only 
>> for text- mounted files; but pipelines are default binary-mounted.
>> If you want to strip CR from a pipeline, then make it explicit.
>>> var=$( prog | sed .)
>> Rewrite that to var=$( prog | tr -d '\r' | sed .)

For compatibility var=$( prog | sed -b -e 's/\r$//' ...)

> I have two problems with this:
> 
> 1.) I build many (~ 50) unix libs and tools MinGW cross on cygwin 
> from sources and this breaks many of the configure and other scripts.
> Feeding back the fixes to the individual lib/tool maintainers will
> take quite some time and also results in lengthy discussion why they
> should care about crappy DOS artefacts at all. A compatibility option
> via environment variable would have been nice.

If your makefiles configured SED you could use:
export SED='sed -b -e '\''s/\r$$//'\'''
and that is probably the best approach: change sed to SED in makefiles,
and configure SED=sed if not set.

Alternatives:
alias sed='sed -b -e '\''s/\r$//'\'''
sed(){ /bin/sed -b -e 's/\r$//' "$@"; }
sed -i 's|\<sed\>|& -b e '\''s/\r$//'\''|g' ... to patch files - double
the "$" to patch makefiles,
or build and use a Mingw(-compatible) sed that converts \r\n to \n on
input and \n to \r\n on output?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09  8:14         ` Soegtrop, Michael
@ 2017-06-09 14:06           ` cyg Simple
  2017-06-09 15:01             ` Soegtrop, Michael
  0 siblings, 1 reply; 37+ messages in thread
From: cyg Simple @ 2017-06-09 14:06 UTC (permalink / raw)
  To: cygwin

On 6/9/2017 4:06 AM, Soegtrop, Michael wrote:
> Dear Blake,
> 
>> External products were being lazy by relying on cygwin to strip CR when they
>> should have stripped it themselves.  But 'sed -b' does NOT strip CR (it is the
>> exact opposite, of keeping CR unstripped).
> 
> I think that there are more people who use sed for text processing in a MinGW cygwin cross environment than there are people using sed for binary data - but looking at the mailing list this might be a subjective view. The maintainers of the Linux centric SW could insert dos2unix in all stuff piped into sed, but this would only be required in this very specific build configuration, and Linux SW maintainers frequently argue why they should care about Windows at all. It can take all sorts of philosophical discussions to get such a change in. And I cannot blame them for a "we are free SW people, if you Windowers can make use of our SW without bothering us - go ahead, but please don't drag us into your mud" attitude. In the end one usually gets it done, but the effort is not negligible.
> 

They have every right and I would most likely do the same if I were in
their positions.  As already explained by Erik the CR needs to be
preserved during the pipe and redirect of data.  Otherwise you corrupt
the data being used.

> I think that sed, grep and awk are in the end text processing tools, and that they should at least have an environment option to behave like text processing tools in a mixed cygwin MinGW environment. With sed I have several 100 issues building a single application, with grep it was just one in my scripts which I fixed. awk no issues so far.
> 

They may have been derived by the need to process text but they were
also derived as *nix software where the CR wasn't an issue.  Dealing
with the data is the end users responsibility and if the data contains a
character that isn't required then the end user needs to remove it with
other tools. This isn't anything new, dealing with Windows produced
files in any *nix environment requires the conversion from and to
Windows formats.

> Btw.: I don't think that it will be easy or even possible to build detection for MinGW generated output into the cygwin dll as you suggested in a previous post. How should the receiving part of a pipe know what kind of DLLs the sending part has loaded? One could have obscure heuristics to detect "text with cr-lf line endings", but this sounds more like a night mare than a solution. The only entity who could detect this is the shell, but then you have more than one shell (and more philosophers). As I said, I think on cygwin sed, grep and awk should have an environment option to be MinGW friendly text processors (as they used to be). Other less text centric SW should be unaffected.
> 

No, there should be no such option.

> Honestly my solution to the problem is to build sed from sources with CR stripping. I thought about it a day and came to the conclusion that everything else is a waste of time.
> 

That is your choice, you could even build sed as a Windows binary
instead of a Cygwin binary; but it would be most beneficial if you
caused the stdio of your Windows applications to be in binary format
instead of text format.  Then the CR wouldn't be an issue during the
pipe process.  Why does your applications stdio need to be in text
format instead of binary format?

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 14:06           ` cyg Simple
@ 2017-06-09 15:01             ` Soegtrop, Michael
  2017-06-09 15:35               ` Andrey Repin
                                 ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09 15:01 UTC (permalink / raw)
  To: cyg Simple, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1552 bytes --]

Dear cyg Simple,

> but it would be most beneficial if you caused the stdio of your
> Windows applications to be in binary format instead of text format.  Then the
> CR wouldn't be an issue during the pipe process.  Why does your applications
> stdio need to be in text format instead of binary format?

it is not my application I have issues with. I am building many open source tools and libraries which are maintained by others, and as you said, these others have every right to deny implementing windows specific workarounds in their tools or build scripts. Why should anybody use "wb" mode to open a file in a Linux centric app or mess around with the input of sed to remove CRs in a build script for such an app? Of cause the same is true for cygwin, except that I think building MinGW apps is an intended use case for cygwin. But then it needs to have ways to deal with MinGW programs which behave like MinGW programs should, and this means ending lines with cr-lf. 

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\0ТÒÐÐ¥\a&ö&ÆVÒ\a&W\x06÷'G3¢\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒ÷\a&ö&ÆV×2æ‡FÖÀФd\x15\x13¢\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöf\x17\x12ðФFö7VÖVçF\x17F–öã¢\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöFö72æ‡FÖÀÐ¥Vç7V'67&–&R\x06–æfó¢\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöÖÂò7Vç7V'67&–&R×6–×\x06ÆPРÐ

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 13:50     ` Brian Inglis
@ 2017-06-09 15:05       ` Soegtrop, Michael
  2017-06-09 15:35         ` Andrey Repin
  0 siblings, 1 reply; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09 15:05 UTC (permalink / raw)
  To: Brian.Inglis, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 785 bytes --]

Dear Brian,

> alias sed='sed -b -e '\''s/\r$//'\'''

thanks, an interesting idea! Putting this into something like .bashrc might have a similar effect as having a special sed build with CR stripping built in.

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\x03B‹KCB”\x1c›Ø›\x19[H\x1c™\^[ܝ\x1cΈ\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÜ\x1c›Ø›\x19[\Ëš\x1d^[[\x03B‘TNˆ\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ˜\KÃB‘^[ØÝ[Y[\x18]\x1a[ÛŽˆ\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ^[ØÜËš\x1d^[[\x03B•[œÝXœØÜšX™H\x1a[™›Îˆ\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÛ[\vÈÝ[œÝXœØÜšX™K\Ú[\^[\x19CBƒB

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:05       ` Soegtrop, Michael
@ 2017-06-09 15:35         ` Andrey Repin
  2017-06-09 15:50           ` Dan Kegel
  2017-06-09 15:56           ` Eric Blake
  0 siblings, 2 replies; 37+ messages in thread
From: Andrey Repin @ 2017-06-09 15:35 UTC (permalink / raw)
  To: Soegtrop, Michael, cygwin

Greetings, Soegtrop, Michael!

>> alias sed='sed -b -e '\''s/\r$//'\'''

> thanks, an interesting idea! Putting this into something like .bashrc might
> have a similar effect as having a special sed build with CR stripping built in.

Except it may not work in makefiles, since make calls sed directly.


-- 
With best regards,
Andrey Repin
Friday, June 9, 2017 18:31:11

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:01             ` Soegtrop, Michael
@ 2017-06-09 15:35               ` Andrey Repin
  2017-06-09 15:51               ` Eric Blake
  2017-06-10 13:42               ` cyg Simple
  2 siblings, 0 replies; 37+ messages in thread
From: Andrey Repin @ 2017-06-09 15:35 UTC (permalink / raw)
  To: Soegtrop, Michael, cygwin

Greetings, Soegtrop, Michael!

> Why should anybody use "wb" mode to open a file in a Linux centric app

Because it removes ambiguity.
Everybody should use binary IO unless they explicitly want text behavior.
Not just because they don't care.


-- 
With best regards,
Andrey Repin
Friday, June 9, 2017 18:29:10

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:35         ` Andrey Repin
@ 2017-06-09 15:50           ` Dan Kegel
  2017-06-09 16:09             ` Soegtrop, Michael
  2017-06-09 15:56           ` Eric Blake
  1 sibling, 1 reply; 37+ messages in thread
From: Dan Kegel @ 2017-06-09 15:50 UTC (permalink / raw)
  To: cygwin; +Cc: Soegtrop, Michael

On Fri, Jun 9, 2017 at 8:31 AM, Andrey Repin <anrdaemon@yandex.ru> wrote:
> Greetings, Soegtrop, Michael!
>
>>> alias sed='sed -b -e '\''s/\r$//'\'''
>
>> thanks, an interesting idea! Putting this into something like .bashrc might
>> have a similar effect as having a special sed build with CR stripping built in.
>
> Except it may not work in makefiles, since make calls sed directly.

One could try making a wrapper shell script where sed usually lives
that adds those options and calls the real sed...

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:01             ` Soegtrop, Michael
  2017-06-09 15:35               ` Andrey Repin
@ 2017-06-09 15:51               ` Eric Blake
  2017-06-09 16:56                 ` Soegtrop, Michael
  2017-06-10 13:48                 ` cyg Simple
  2017-06-10 13:42               ` cyg Simple
  2 siblings, 2 replies; 37+ messages in thread
From: Eric Blake @ 2017-06-09 15:51 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 2119 bytes --]

On 06/09/2017 10:01 AM, Soegtrop, Michael wrote:
> Dear cyg Simple,
> 
>> but it would be most beneficial if you caused the stdio of your
>> Windows applications to be in binary format instead of text format.  Then the
>> CR wouldn't be an issue during the pipe process.  Why does your applications
>> stdio need to be in text format instead of binary format?
> 
> it is not my application I have issues with. I am building many open source tools and libraries which are maintained by others, and as you said, these others have every right to deny implementing windows specific workarounds in their tools or build scripts. Why should anybody use "wb" mode to open a file in a Linux centric app 

Using "wb" is GOOD practice in programs like tar, that WANT to produce
their output as binary no matter what.

But you are mixing things up.  "wb" is binary mode, but manipulating CRs
is only done in text mode.  Text mode is only possible with "wt" (a
Cygwin extension that doesn't work elsewhere), or with "w" (depending on
the underlying mount - on Cygwin, "w" is text on text mode mounts, and
binary on pipelines and binary mode mounts) (on other platforms, "w" is
always text mode, but indistinguishable from "wb" binary mode).

> or mess around with the input of sed to remove CRs in a build script for such an app?

If you are writing a Linux app that processes data produced on a windows
machine, then YOU must strip CR from that data (Linux sed will NOT strip
it).  So now cygwin sed does the same thing.

> Of cause the same is true for cygwin, except that I think building MinGW apps is an intended use case for cygwin. 

Building mingw apps may be one use of cygwin, but it is not the
"intended use case".  The intended use case of cygwin is to emulate as
much as possible of a linux environment.  If building for mingw on Linux
requires you to explicitly strip CR when dealing with data from Windows,
then so should building for mingw on Cygwin.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:35         ` Andrey Repin
  2017-06-09 15:50           ` Dan Kegel
@ 2017-06-09 15:56           ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Eric Blake @ 2017-06-09 15:56 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 559 bytes --]

On 06/09/2017 10:31 AM, Andrey Repin wrote:
> Greetings, Soegtrop, Michael!
> 
>>> alias sed='sed -b -e '\''s/\r$//'\'''
> 
>> thanks, an interesting idea! Putting this into something like .bashrc might
>> have a similar effect as having a special sed build with CR stripping built in.
> 
> Except it may not work in makefiles, since make calls sed directly.

Then make it a script that you put first on your $PATH.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:50           ` Dan Kegel
@ 2017-06-09 16:09             ` Soegtrop, Michael
  2017-06-10 21:09               ` Brian Inglis
  0 siblings, 1 reply; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09 16:09 UTC (permalink / raw)
  To: Dan Kegel, cygwin

Dear Dan,
 
> One could try making a wrapper shell script where sed usually lives that adds
> those options and calls the real sed...

I tried to do exactly this, but I tried to pipe a dos2unix command in between. It got a bit complicated because I had to parse the sed command line arguments. The solution of adding an extra command with -e is much more elegant. And you are right, replacing sed with a shell script is better than using an alias.

But the -e method won't work for grep and for awk not in all cases.

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:51               ` Eric Blake
@ 2017-06-09 16:56                 ` Soegtrop, Michael
  2017-06-09 18:42                   ` Hans-Bernhard Bröker
  2017-06-10 13:48                 ` cyg Simple
  1 sibling, 1 reply; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09 16:56 UTC (permalink / raw)
  To: Eric Blake, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3893 bytes --]

Dear Eric,

> Building mingw apps may be one use of cygwin, but it is not the "intended use
> case". 

I said "an" intended use case, not "the". I also use cygwin for a lot of other things.

> If you are writing a Linux app that processes data produced on a windows
> machine, then YOU must strip CR from that data (Linux sed will NOT strip it).
> So now cygwin sed does the same thing.

Maybe my situation gets clear when I describe what my use case is: I maintain the scripts which build the Windows installers for Coq - a proof assistant with a GTK based UI. This starts building a MinGW OCaml compiler from sources. I could also build a host=cygwin target=MinGW OCaml compiler, to get around the issue that messages it produces contain CRs, but OCaml doesn't support host!=target very well, and many of the tools it produces I also have to run under cygwin. So I would need 3 OCaml compilers, one build=cygwin, host=cygwin, target=cygwin to build the tools I run under cygwin during the build, then one build=cygwin, host=cygwin, target=MinGW to build the tools I finally need for plain Windows. But since Coq also needs OCaml at run time I also would have to build a build=cygwin, host=MinGW, target=MinGW variant. Many of the tools and libs I need have similar issues.
 
I admit I am lazy and build only one OCaml compiler, the build=cygwin, host=MinGW, target=MinGW variant, because I will need this eventually anyway. The only issue this results in is that its text messages (like version or lib path info) contain carriage returns.

All the OCaml and Coq build scripts (and the build scripts for the libraries and tools I need) themselves are not maintained by me. I just maintain the wrapper which sets up a fresh cygwin, gets all the sources, builds the whole stuff and finally produces a Windows installer for others to use. I call many scripts and tools which are originally designed to run on Linux and are maintained by Linux centric people and I do not want to bother them with such issues.

Cross building on Linux would be much more complicated, because on Linux I would no matter what need 3 OCaml variants but also 2 Coq variants, because I need to compile the Coq libraries, so I need to run Coq on the build host as well. With cygwin I can call the MinGW variants. 

All this did work remarkably well on cygwin, because cygwin had support to call MinGW programs and to process their text oriented output.

I don't see another way than having sed strip away the CRs. It doesn't make sense to build programs intended to be run under plain Windows such that they do not produce CRs. It doesn't make sense to hack around in build scripts and source code of Linux centric tools just to get this Windows build done. The true way would be to support host!=target in OCaml and to make 2..3 host/target build variants of everything. But it would take ages, waste a lot of energy (think of the CI test severs), slow down development, ... and all this just to get rid of maybe 200 carriage return characters in strings communicated in a build process.

For me, cygwin is a pragmatic tool to get the job done. And it does, perfectly and better than I expected when I started this, except that current cygwin's text tools can't process text produced by MinGW tools out of the box any more.

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\x03B‹KCB”\x1c›Ø›\x19[H\x1c™\^[ܝ\x1cΈ\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÜ\x1c›Ø›\x19[\Ëš\x1d^[[\x03B‘TNˆ\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ˜\KÃB‘^[ØÝ[Y[\x18]\x1a[ÛŽˆ\b\b\b\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÙ^[ØÜËš\x1d^[[\x03B•[œÝXœØÜšX™H\x1a[™›Îˆ\b\b\b\b\b\x1a\x1d\x1d\x1c\x0e‹ËØÞYÝÚ[‹˜ÛÛKÛ[\vÈÝ[œÝXœØÜšX™K\Ú[\^[\x19CBƒB

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 16:56                 ` Soegtrop, Michael
@ 2017-06-09 18:42                   ` Hans-Bernhard Bröker
  2017-06-09 19:30                     ` Erik Soderquist
  2017-06-09 22:28                     ` Soegtrop, Michael
  0 siblings, 2 replies; 37+ messages in thread
From: Hans-Bernhard Bröker @ 2017-06-09 18:42 UTC (permalink / raw)
  To: cygwin

Am 09.06.2017 um 18:56 schrieb Soegtrop, Michael:

> Maybe my situation gets clear when I describe what my use case is: I
> maintain the scripts which build the Windows installers for Coq - a
> proof assistant with a GTK based UI. This starts building a MinGW
> OCaml compiler from sources. 

You're doing this via Cygwin, i.e. on a Windows machine, where MinGW is 
a _native_ toolchain.  That begs the question: why are you doing a cross 
build in the first place?

Cross building generally means that _no_ programs built for the target 
host may ever be run by the build scripts, because they simply won't 
work.  Any attempt to do so is either a severe bug in the existing build 
setup, or the consequence of trying to coerce a build script that just 
cannot handle cross-building into doing it anyway.

So you shouldn't even be getting to any place where the output of a 
cross-target (i.e. MinGW) executable is run on the build host, and its 
output piped into a build platform (i.e. Cygwin) tool.

That means what you're trying to argue here is that an evident 
short-coming of some build setups should be fixed by breaking Cygwin 
pipes' mode of operation for everyone.  Sorry, but I don't see that 
happening.

> Cross building on Linux would be much more complicated, because on
> Linux I would no matter what need 3 OCaml variants but also 2 Coq
> variants, because I need to compile the Coq libraries, so I need to
> run Coq on the build host as well. With cygwin I can call the MinGW
> variants.

Indeed.  That is a lucky _exception_ to what you can do in a cross 
build.  But you have to pay a price for this unprecedented luxury.  The 
price is that you have to massage the output of those MinGW programs to 
follow Cygwin rules, or change the way those MinGW programs are built to 
make them follow the rules by themselves.

> I don't see another way than having sed strip away the CRs. It
> doesn't make sense to build programs intended to be run under plain
> Windows such that they do not produce CRs. 

I believe it makes much more sense than you think.  Hardly any Windows 
tool worth using actually _needs_ those CRs in the first place.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 18:42                   ` Hans-Bernhard Bröker
@ 2017-06-09 19:30                     ` Erik Soderquist
  2017-06-09 22:28                     ` Soegtrop, Michael
  1 sibling, 0 replies; 37+ messages in thread
From: Erik Soderquist @ 2017-06-09 19:30 UTC (permalink / raw)
  To: cygwin

On Fri, Jun 9, 2017 at 2:42 PM, Hans-Bernhard Bröker wrote:
>> I don't see another way than having sed strip away the CRs. It
>> doesn't make sense to build programs intended to be run under plain
>> Windows such that they do not produce CRs.
>
>
> I believe it makes much more sense than you think.  Hardly any Windows tool
> worth using actually _needs_ those CRs in the first place.

I agree heartily with this; the only Windows tool I use that is
actually dependent on CR is notepad, and usually that is only for a
lazy verification that CR is/isn't present

Everything I've written for dealing with Windows CR infested outputs
in *nix environments I strip all CRs as my first step, and do not put
CRs in my output; so far no complaints from end users about the
missing CRs.

-- Erik

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 18:42                   ` Hans-Bernhard Bröker
  2017-06-09 19:30                     ` Erik Soderquist
@ 2017-06-09 22:28                     ` Soegtrop, Michael
  2017-06-09 22:43                       ` Harvey Stein
  2017-06-09 23:16                       ` Ray Donnelly
  1 sibling, 2 replies; 37+ messages in thread
From: Soegtrop, Michael @ 2017-06-09 22:28 UTC (permalink / raw)
  To: Hans-Bernhard Bröker, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3205 bytes --]

Dear Hans-Bernhard,

> You're doing this via Cygwin, i.e. on a Windows machine, where MinGW is a
> _native_ toolchain.  That begs the question: why are you doing a cross build in
> the first place?

I simply couldn't find another way to build 50 configure / make style libraries and tools on Windows. If there is a method I haven't heard of, please let me know. MSYS and MSYS2 couldn't build a single of these libraries or tools without huge patches while doing the same on cygwin used to work out of the box.

> So you shouldn't even be getting to any place where the output of a cross-
> target (i.e. MinGW) executable is run on the build host, and its output piped
> into a build platform (i.e. Cygwin) tool.

I think it is quite common to build MinGW tools on cygwin and then run these MinGW tools from cygwin, e.g. for regression testing and to process their output in cygwin e.g. to get a test verdict.
 
> That means what you're trying to argue here is that an evident short-coming of
> some build setups should be fixed by breaking Cygwin pipes' mode of operation
> for everyone.  Sorry, but I don't see that happening.

I think that arguing that one should not run MinGW programs from cygwin because they are entirely different operating systems and that a build system which does this has inherent short-comings is somehow neglecting reality. Why shouldn’t one do this is the only problem are CRs?

I also didn't ask to mess up Cygwin's pipe system. All I ask for is that tools which are documented as text processing tools like sed have an environment option to be MinGW friendly. I think people who are using sed for binary files should use the -b option - if only to document that they are doing this intentionally.

> > I don't see another way than having sed strip away the CRs. It doesn't
> > make sense to build programs intended to be run under plain Windows
> > such that they do not produce CRs.
> 
> I believe it makes much more sense than you think.  Hardly any Windows tool
> worth using actually _needs_ those CRs in the first place.

That is true - just, as I said many times, all these tools are out of my control and I do not want to tell people to open streams used for text output with "wb" just to get around pure Windows artefacts. Many people on this list said that the maintainers of such SW have every right to reject such change requests and I agree. Please think about what you are asking me here to do: filing 200 fairly useless change requests against 50 Linux centric tool and libraries. Sorry, I am not going to do this. I rather beg here to be MinGW friendly.

Best regards,

Michael

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
\0ТÒÐÐ¥\a&ö&ÆVÒ\a&W\x06÷'G3¢\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒ÷\a&ö&ÆV×2æ‡FÖÀФd\x15\x13¢\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöf\x17\x12ðФFö7VÖVçF\x17F–öã¢\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöFö72æ‡FÖÀÐ¥Vç7V'67&–&R\x06–æfó¢\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöÖÂò7Vç7V'67&–&R×6–×\x06ÆPРÐ

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 22:28                     ` Soegtrop, Michael
@ 2017-06-09 22:43                       ` Harvey Stein
  2017-06-09 23:16                       ` Ray Donnelly
  1 sibling, 0 replies; 37+ messages in thread
From: Harvey Stein @ 2017-06-09 22:43 UTC (permalink / raw)
  To: cygwin; +Cc: Hans-Bernhard Bröker

I just had to debug & fix a problem from the sed behavior change that
has little to do with MinGW.

I'm running a native version of Emacs and have makefiles, lots of
scripts, and various commands in my makefiles for extracting
dependencies.  The commands use sed, gawk, ..., and started creating
bad dependency files because of carriage returns showing up in the
last field when filtering with sed & gawk.

It took a while to track down, especially because the command was
working properly when run directly in a cygwin terminal running bash,
but gave different behavior when make ran the command from inside of
Emacs.

I could imagine this causing all sorts of trouble for people in general.


On Fri, Jun 9, 2017 at 6:28 PM, Soegtrop, Michael
<michael.soegtrop@intel.com> wrote:
> Dear Hans-Bernhard,
>
>> You're doing this via Cygwin, i.e. on a Windows machine, where MinGW is a
>> _native_ toolchain.  That begs the question: why are you doing a cross build in
>> the first place?
>
> I simply couldn't find another way to build 50 configure / make style libraries and tools on Windows. If there is a method I haven't heard of, please let me know. MSYS and MSYS2 couldn't build a single of these libraries or tools without huge patches while doing the same on cygwin used to work out of the box.
>
>> So you shouldn't even be getting to any place where the output of a cross-
>> target (i.e. MinGW) executable is run on the build host, and its output piped
>> into a build platform (i.e. Cygwin) tool.
>
> I think it is quite common to build MinGW tools on cygwin and then run these MinGW tools from cygwin, e.g. for regression testing and to process their output in cygwin e.g. to get a test verdict.
>
>> That means what you're trying to argue here is that an evident short-coming of
>> some build setups should be fixed by breaking Cygwin pipes' mode of operation
>> for everyone.  Sorry, but I don't see that happening.
>
> I think that arguing that one should not run MinGW programs from cygwin because they are entirely different operating systems and that a build system which does this has inherent short-comings is somehow neglecting reality. Why shouldn’t one do this is the only problem are CRs?
>
> I also didn't ask to mess up Cygwin's pipe system. All I ask for is that tools which are documented as text processing tools like sed have an environment option to be MinGW friendly. I think people who are using sed for binary files should use the -b option - if only to document that they are doing this intentionally.
>
>> > I don't see another way than having sed strip away the CRs. It doesn't
>> > make sense to build programs intended to be run under plain Windows
>> > such that they do not produce CRs.
>>
>> I believe it makes much more sense than you think.  Hardly any Windows tool
>> worth using actually _needs_ those CRs in the first place.
>
> That is true - just, as I said many times, all these tools are out of my control and I do not want to tell people to open streams used for text output with "wb" just to get around pure Windows artefacts. Many people on this list said that the maintainers of such SW have every right to reject such change requests and I agree. Please think about what you are asking me here to do: filing 200 fairly useless change requests against 50 Linux centric tool and libraries. Sorry, I am not going to do this. I rather beg here to be MinGW friendly.
>
> Best regards,
>
> Michael
>
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Christin Eisenschmid, Christian Lamprechter
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928



-- 
Harvey J. Stein
hjstein@gmail.com
http://www.linkedin.com/in/harveyjstein
Selected papers and presentations available at: http://ssrn.com/author=732372

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 22:28                     ` Soegtrop, Michael
  2017-06-09 22:43                       ` Harvey Stein
@ 2017-06-09 23:16                       ` Ray Donnelly
  1 sibling, 0 replies; 37+ messages in thread
From: Ray Donnelly @ 2017-06-09 23:16 UTC (permalink / raw)
  To: cygwin; +Cc: Hans-Bernhard Bröker

On Fri, Jun 9, 2017 at 11:28 PM, Soegtrop, Michael
<michael.soegtrop@intel.com> wrote:
> Dear Hans-Bernhard,
>
>> You're doing this via Cygwin, i.e. on a Windows machine, where MinGW is a
>> _native_ toolchain.  That begs the question: why are you doing a cross build in
>> the first place?
>
> I simply couldn't find another way to build 50 configure / make style libraries and tools on Windows. If there is a method I haven't heard of, please let me know. MSYS and MSYS2 couldn't build a single of these libraries or tools without huge patches while doing the same on cygwin used to work out of the box.

Sorry to go a little off-topic, can you give me a list of some of
these libraries and tools that can be built for a mingw-w64 host but
which don't build on MSYS2 without huge patches? The stuff you are
describing is the raison d'être for MSYS2 (and MSYS but I don't care
about MSYS). Feel free to take this off-list.

>
>> So you shouldn't even be getting to any place where the output of a cross-
>> target (i.e. MinGW) executable is run on the build host, and its output piped
>> into a build platform (i.e. Cygwin) tool.
>
> I think it is quite common to build MinGW tools on cygwin and then run these MinGW tools from cygwin, e.g. for regression testing and to process their output in cygwin e.g. to get a test verdict.
>
>> That means what you're trying to argue here is that an evident short-coming of
>> some build setups should be fixed by breaking Cygwin pipes' mode of operation
>> for everyone.  Sorry, but I don't see that happening.
>
> I think that arguing that one should not run MinGW programs from cygwin because they are entirely different operating systems and that a build system which does this has inherent short-comings is somehow neglecting reality. Why shouldn’t one do this is the only problem are CRs?
>
> I also didn't ask to mess up Cygwin's pipe system. All I ask for is that tools which are documented as text processing tools like sed have an environment option to be MinGW friendly. I think people who are using sed for binary files should use the -b option - if only to document that they are doing this intentionally.
>
>> > I don't see another way than having sed strip away the CRs. It doesn't
>> > make sense to build programs intended to be run under plain Windows
>> > such that they do not produce CRs.
>>
>> I believe it makes much more sense than you think.  Hardly any Windows tool
>> worth using actually _needs_ those CRs in the first place.
>
> That is true - just, as I said many times, all these tools are out of my control and I do not want to tell people to open streams used for text output with "wb" just to get around pure Windows artefacts. Many people on this list said that the maintainers of such SW have every right to reject such change requests and I agree. Please think about what you are asking me here to do: filing 200 fairly useless change requests against 50 Linux centric tool and libraries. Sorry, I am not going to do this. I rather beg here to be MinGW friendly.
>
> Best regards,
>
> Michael
>
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Christin Eisenschmid, Christian Lamprechter
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:01             ` Soegtrop, Michael
  2017-06-09 15:35               ` Andrey Repin
  2017-06-09 15:51               ` Eric Blake
@ 2017-06-10 13:42               ` cyg Simple
  2 siblings, 0 replies; 37+ messages in thread
From: cyg Simple @ 2017-06-10 13:42 UTC (permalink / raw)
  To: cygwin



On 6/9/2017 11:01 AM, Soegtrop, Michael wrote:
> Dear cyg Simple,
> 
>> but it would be most beneficial if you caused the stdio of your
>> Windows applications to be in binary format instead of text format.  Then the
>> CR wouldn't be an issue during the pipe process.  Why does your applications
>> stdio need to be in text format instead of binary format?
> 
> it is not my application I have issues with. I am building many open source tools and libraries which are maintained by others, and as you said, these others have every right to deny implementing windows specific workarounds in their tools or build scripts. Why should anybody use "wb" mode to open a file in a Linux centric app or mess around with the input of sed to remove CRs in a build script for such an app? Of cause the same is true for cygwin, except that I think building MinGW apps is an intended use case for cygwin. But then it needs to have ways to deal with MinGW programs which behave like MinGW programs should, and this means ending lines with cr-lf. 

I see the conversation gets a bit lengthy already and many opinions
exist.  I didn't say to use 'wb' with every open.  I said set the stdio,
(i.e. STDIN, STDOUT, STDERR) to binary mode.  See the documentation for
the MinGW lib/binmode.o object use when linking the binary.

As for Cygwin being an intended use case for building MinGW apps, why do
you think it was forked as MSYS?  Cygwin itself has never completely
dealt easily with cross compatibility between Windows and Cygwin itself.
 If you wish to use Cygwin as is for your work with building MinGW tools
then you'll need to set your standards to work within that environment.

In other follow-up email you explain your use case, I suggest that MSYS
is the better fit for you with that use case.  Check into it.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 15:51               ` Eric Blake
  2017-06-09 16:56                 ` Soegtrop, Michael
@ 2017-06-10 13:48                 ` cyg Simple
  2017-06-10 14:21                   ` Hans-Bernhard Bröker
  2017-06-11  2:31                   ` Eric Blake
  1 sibling, 2 replies; 37+ messages in thread
From: cyg Simple @ 2017-06-10 13:48 UTC (permalink / raw)
  To: cygwin



On 6/9/2017 11:51 AM, Eric Blake wrote:
> On 06/09/2017 10:01 AM, Soegtrop, Michael wrote:
>> Dear cyg Simple,
>>
>>> but it would be most beneficial if you caused the stdio of your
>>> Windows applications to be in binary format instead of text format.  Then the
>>> CR wouldn't be an issue during the pipe process.  Why does your applications
>>> stdio need to be in text format instead of binary format?
>>
>> it is not my application I have issues with. I am building many open source tools and libraries which are maintained by others, and as you said, these others have every right to deny implementing windows specific workarounds in their tools or build scripts. Why should anybody use "wb" mode to open a file in a Linux centric app 
> 
> Using "wb" is GOOD practice in programs like tar, that WANT to produce
> their output as binary no matter what.
> 
> But you are mixing things up.  "wb" is binary mode, but manipulating CRs
> is only done in text mode.  Text mode is only possible with "wt" (a
> Cygwin extension that doesn't work elsewhere), or with "w" (depending on
> the underlying mount - on Cygwin, "w" is text on text mode mounts, and
> binary on pipelines and binary mode mounts) (on other platforms, "w" is
> always text mode, but indistinguishable from "wb" binary mode).
> 

Uhm, 'wt' and 'wb' came from MS itself.  GNU GCC was adapted to allow it
and just ignores it on systems that don't need it.

>> or mess around with the input of sed to remove CRs in a build script for such an app?
> 
> If you are writing a Linux app that processes data produced on a windows
> machine, then YOU must strip CR from that data (Linux sed will NOT strip
> it).  So now cygwin sed does the same thing.
> 
>> Of cause the same is true for cygwin, except that I think building MinGW apps is an intended use case for cygwin. 
> 
> Building mingw apps may be one use of cygwin, but it is not the
> "intended use case".  The intended use case of cygwin is to emulate as
> much as possible of a linux environment.  If building for mingw on Linux
> requires you to explicitly strip CR when dealing with data from Windows,
> then so should building for mingw on Cygwin.
> 

Exactly.  The other option is to use MSYS where this issue is dealt with.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-10 13:48                 ` cyg Simple
@ 2017-06-10 14:21                   ` Hans-Bernhard Bröker
  2017-06-11  2:31                   ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Hans-Bernhard Bröker @ 2017-06-10 14:21 UTC (permalink / raw)
  To: cygwin

Am 10.06.2017 um 15:48 schrieb cyg Simple:

> Uhm, 'wt' and 'wb' came from MS itself.  GNU GCC was adapted to allow it
> and just ignores it on systems that don't need it.

Not really.  Only "wt" is a DOS-/Windows-ism, while "wb" is part of the 
standard library, and has been ever since there has been a standard.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-09 16:09             ` Soegtrop, Michael
@ 2017-06-10 21:09               ` Brian Inglis
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Inglis @ 2017-06-10 21:09 UTC (permalink / raw)
  To: cygwin

On 2017-06-09 10:09, Soegtrop, Michael wrote:
>> One could try making a wrapper shell script where sed usually
>> lives that adds those options and calls the real sed...
> I tried to do exactly this, but I tried to pipe a dos2unix command
> in between. It got a bit complicated because I had to parse the sed
> command line arguments. The solution of adding an extra command with
> -e is much more elegant. And you are right, replacing sed with a
> shell script is better than using an alias.

There is one issue here with sed, complicating simple aliasing or
substitution, requiring a shell function or script be used in the
general case.

If -e is used, then any inline script argument must be preceded by -e.

A simple alias, requiring an inline script argument, would have to be:
	alias sed='sed -es/\\r\$// -e'.

A shell function or script has to examine arguments to see if they are
options and if they have arguments, and whether any option is -e,
--expression or -f, --file.
Options and their arguments are skipped, and if -[ef] is not seen, the
next argument is an inline script, and must be preceded by -e.

For portable and safe scripts, explicit use of -e before inline script
arguments, and -- to end options, before input files, is recommended for
that reason.
And for makefiles, use SED, and default SED=sed or /bin/sed if not defined.

> But the -e method won't work for grep and for awk not in all cases.

Then you have to explicitly use tr, sed, or d2u/dos2unix to pre-process
Windows input or post-process Windows output.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-10 13:48                 ` cyg Simple
  2017-06-10 14:21                   ` Hans-Bernhard Bröker
@ 2017-06-11  2:31                   ` Eric Blake
  2017-06-13 14:11                     ` cyg Simple
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Blake @ 2017-06-11  2:31 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 616 bytes --]

On 06/10/2017 08:48 AM, cyg Simple wrote:

> 
> Uhm, 'wt' and 'wb' came from MS itself.

Not quite. fopen(,"wb") comes from POSIX.  "wb" is probably a microsoft
extension, but it is certainly not in POSIX nor in glibc.

>  GNU GCC was adapted to allow it

Huh? It's not whether the compiler allows it, but whether libc allows
it.  ALL libc that are remotely close to POSIX compliant support
fopen(,"wb"), but only Windows platforms (and NOT glibc) support
fopen(,"wt").

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-11  2:31                   ` Eric Blake
@ 2017-06-13 14:11                     ` cyg Simple
  2017-06-13 17:34                       ` Brian Inglis
  0 siblings, 1 reply; 37+ messages in thread
From: cyg Simple @ 2017-06-13 14:11 UTC (permalink / raw)
  To: cygwin

On 6/10/2017 10:30 PM, Eric Blake wrote:
> On 06/10/2017 08:48 AM, cyg Simple wrote:
> 
>>
>> Uhm, 'wt' and 'wb' came from MS itself.
> 
> Not quite. fopen(,"wb") comes from POSIX.  "wb" is probably a microsoft
> extension, but it is certainly not in POSIX nor in glibc.
> 

I think it's a C standard so it should be in glibc.  It may be mentioned
in the POSIX standard as in support of the C standard.

>>  GNU GCC was adapted to allow it
> 
> Huh? It's not whether the compiler allows it, but whether libc allows
> it.  ALL libc that are remotely close to POSIX compliant support
> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
> fopen(,"wt").
> 

Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:

"If additional characters follow the sequence, the behavior depends on
the library implementation: some implementations may ignore additional
characters so that for example an additional "t" (sometimes used to
explicitly state a text file) is accepted."

There is also a lot of discussion about the topic at:

https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode

As for glibc, it will just ignore the extra character but it allows the
use of "wt"; it just means nothing to that C runtime library. It does
aide in portable code though.

As for me conflating GCC with a C runtime - please forgive my lapse in
memory.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-13 14:11                     ` cyg Simple
@ 2017-06-13 17:34                       ` Brian Inglis
  2017-06-14 16:07                         ` cyg Simple
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Inglis @ 2017-06-13 17:34 UTC (permalink / raw)
  To: cygwin

On 2017-06-13 08:11, cyg Simple wrote:
> On 6/10/2017 10:30 PM, Eric Blake wrote:
>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>> Uhm, 'wt' and 'wb' came from MS itself.
>> Not quite. fopen(,"wb") comes from POSIX.  "wt" is probably a microsoft
>> extension, but it is certainly not in POSIX nor in glibc.
> I think it's a C standard so it should be in glibc.  It may be mentioned
> in the POSIX standard as in support of the C standard.
>>>  GNU GCC was adapted to allow it
>> Huh? It's not whether the compiler allows it, but whether libc allows
>> it.  ALL libc that are remotely close to POSIX compliant support
>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>> fopen(,"wt").
> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
> "If additional characters follow the sequence, the behavior depends on
> the library implementation: some implementations may ignore additional
> characters so that for example an additional "t" (sometimes used to
> explicitly state a text file) is accepted."
> There is also a lot of discussion about the topic at:
> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
> As for glibc, it will just ignore the extra character but it allows the
> use of "wt"; it just means nothing to that C runtime library. It does
> aide in portable code though.
> As for me conflating GCC with a C runtime - please forgive my lapse in
> memory.

There's no need for open mode "t", as text is the default mode unless
"b" is specified, and assuming you use "cooked" line I/O functions like
fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
line terminators unless you use formats like "%c" which could see them.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-13 17:34                       ` Brian Inglis
@ 2017-06-14 16:07                         ` cyg Simple
  2017-06-14 19:04                           ` Brian Inglis
  0 siblings, 1 reply; 37+ messages in thread
From: cyg Simple @ 2017-06-14 16:07 UTC (permalink / raw)
  To: cygwin

On 6/13/2017 1:34 PM, Brian Inglis wrote:
> On 2017-06-13 08:11, cyg Simple wrote:
>> On 6/10/2017 10:30 PM, Eric Blake wrote:
>>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>>> Uhm, 'wt' and 'wb' came from MS itself.
>>> Not quite. fopen(,"wb") comes from POSIX.  "wt" is probably a microsoft
>>> extension, but it is certainly not in POSIX nor in glibc.
>> I think it's a C standard so it should be in glibc.  It may be mentioned
>> in the POSIX standard as in support of the C standard.
>>>>  GNU GCC was adapted to allow it
>>> Huh? It's not whether the compiler allows it, but whether libc allows
>>> it.  ALL libc that are remotely close to POSIX compliant support
>>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>>> fopen(,"wt").
>> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
>> "If additional characters follow the sequence, the behavior depends on
>> the library implementation: some implementations may ignore additional
>> characters so that for example an additional "t" (sometimes used to
>> explicitly state a text file) is accepted."
>> There is also a lot of discussion about the topic at:
>> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
>> As for glibc, it will just ignore the extra character but it allows the
>> use of "wt"; it just means nothing to that C runtime library. It does
>> aide in portable code though.
>> As for me conflating GCC with a C runtime - please forgive my lapse in
>> memory.
> 
> There's no need for open mode "t", as text is the default mode unless
> "b" is specified, and assuming you use "cooked" line I/O functions like
> fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
> line terminators unless you use formats like "%c" which could see them.
> 

That isn't exactly true based on the MSDN[1] the "t" manages the CTRL-Z
EOF marker.  However, I agree that it worthless.  But regardless the C
standard states that "t" or whatever extra character can be added and
left to the implementing library to interpret or ignored.  If the C
runtime library doesn't use it or ignore it then it isn't complying to
the C standard.

[1] https://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.140).aspx

"t
Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
an EOF character on input. In files that are opened for reading/writing
by using "a+", fopen checks for a CTRL+Z at the end of the file and
removes it, if it is possible. This is done because using fseek and
ftell to move within a file that ends with CTRL+Z may cause fseek to
behave incorrectly near the end of the file."

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts
  2017-06-14 16:07                         ` cyg Simple
@ 2017-06-14 19:04                           ` Brian Inglis
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Inglis @ 2017-06-14 19:04 UTC (permalink / raw)
  To: cygwin

On 2017-06-14 10:07, cyg Simple wrote:
> On 6/13/2017 1:34 PM, Brian Inglis wrote:
>> On 2017-06-13 08:11, cyg Simple wrote:
>>> On 6/10/2017 10:30 PM, Eric Blake wrote:
>>>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>>>> Uhm, 'wt' and 'wb' came from MS itself.
>>>> Not quite. fopen(,"wb") comes from POSIX.  "wt" is probably a microsoft
>>>> extension, but it is certainly not in POSIX nor in glibc.
>>> I think it's a C standard so it should be in glibc.  It may be mentioned
>>> in the POSIX standard as in support of the C standard.
>>>>>  GNU GCC was adapted to allow it
>>>> Huh? It's not whether the compiler allows it, but whether libc allows
>>>> it.  ALL libc that are remotely close to POSIX compliant support
>>>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>>>> fopen(,"wt").
>>> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
>>> "If additional characters follow the sequence, the behavior depends on
>>> the library implementation: some implementations may ignore additional
>>> characters so that for example an additional "t" (sometimes used to
>>> explicitly state a text file) is accepted."
>>> There is also a lot of discussion about the topic at:
>>> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
>>> As for glibc, it will just ignore the extra character but it allows the
>>> use of "wt"; it just means nothing to that C runtime library. It does
>>> aide in portable code though.
>>> As for me conflating GCC with a C runtime - please forgive my lapse in
>>> memory.
>>
>> There's no need for open mode "t", as text is the default mode unless
>> "b" is specified, and assuming you use "cooked" line I/O functions like
>> fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
>> line terminators unless you use formats like "%c" which could see them.
>>
> 
> That isn't exactly true based on the MSDN[1] the "t" manages the CTRL-Z
> EOF marker.  However, I agree that it worthless.  But regardless the C
> standard states that "t" or whatever extra character can be added and
> left to the implementing library to interpret or ignored.  If the C
> runtime library doesn't use it or ignore it then it isn't complying to
> the C standard.

The Standard supports only /[ra](b|+|b+|+b)?|w(b|+|b+|+b)?x?/, although
implementations may choose to ignore some of the allowed trailing
characters (presumably "b", "+", or "x", as the footnote is unclear), or
the file so created may not be accessible as a stream, and anything else
invokes UB.

"7.21.5.3 The fopen function
Synopsis
1 #include <stdio.h>
FILE *fopen(const char * restrict filename,
const char * restrict mode);
Description
...
3 The argument mode points to a string. If the string is one of the
following, the file is open in the indicated mode. Otherwise, the
behavior is undefined.[271]

r		open text file for reading
w		truncate to zero length or create text file for writing
wx		create text file for writing
a		append; open or create text file for writing at
		end-of-file
rb		open binary file for reading
wb		truncate to zero length or create binary file for
		writing
wbx		create binary file for writing
ab		append; open or create binary file for writing at
		end-of-file
r+		open text file for update (reading and writing)
w+		truncate to zero length or create text file for update
w+x		create text file for update
a+		append; open or create text file for update, writing at
		end-of-file
r+b or rb+	open binary file for update (reading and writing)
w+b or wb+	truncate to zero length or create binary file for update
w+bx or wb+x	create binary file for update
a+b or ab+	append; open or create binary file for update, writing
		at end-of-file
...
[271] If the string begins with one of the above sequences, the
implementation might choose to ignore the remaining characters, or it
might use them to select different kinds of a file (some of which might
not conform to the properties in 7.21.2."

> [1] https://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.140).aspx
> 
> "t
> Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
> an EOF character on input. In files that are opened for reading/writing
> by using "a+", fopen checks for a CTRL+Z at the end of the file and
> removes it, if it is possible. This is done because using fseek and
> ftell to move within a file that ends with CTRL+Z may cause fseek to
> behave incorrectly near the end of the file."

Wonder if "t" is also required in order to have <ctrl-Z> recognized as
console input EOF?
That page also documents a bunch of other mode characters and encoding
arguments that make that implementation far from Standard.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-06-14 19:04 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 16:23 CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts Soegtrop, Michael
2017-06-07 17:23 ` Eric Blake
2017-06-07 19:26   ` Brian Inglis
2017-06-08  8:50   ` Soegtrop, Michael
2017-06-08 13:31     ` Vince Rice
2017-06-08 14:52       ` Soegtrop, Michael
2017-06-08 15:04       ` Eric Blake
2017-06-08 15:08     ` Eric Blake
2017-06-08 17:34       ` cyg Simple
2017-06-08 18:51     ` L A Walsh
2017-06-08 19:57       ` Eric Blake
2017-06-09  8:14         ` Soegtrop, Michael
2017-06-09 14:06           ` cyg Simple
2017-06-09 15:01             ` Soegtrop, Michael
2017-06-09 15:35               ` Andrey Repin
2017-06-09 15:51               ` Eric Blake
2017-06-09 16:56                 ` Soegtrop, Michael
2017-06-09 18:42                   ` Hans-Bernhard Bröker
2017-06-09 19:30                     ` Erik Soderquist
2017-06-09 22:28                     ` Soegtrop, Michael
2017-06-09 22:43                       ` Harvey Stein
2017-06-09 23:16                       ` Ray Donnelly
2017-06-10 13:48                 ` cyg Simple
2017-06-10 14:21                   ` Hans-Bernhard Bröker
2017-06-11  2:31                   ` Eric Blake
2017-06-13 14:11                     ` cyg Simple
2017-06-13 17:34                       ` Brian Inglis
2017-06-14 16:07                         ` cyg Simple
2017-06-14 19:04                           ` Brian Inglis
2017-06-10 13:42               ` cyg Simple
2017-06-09 13:50     ` Brian Inglis
2017-06-09 15:05       ` Soegtrop, Michael
2017-06-09 15:35         ` Andrey Repin
2017-06-09 15:50           ` Dan Kegel
2017-06-09 16:09             ` Soegtrop, Michael
2017-06-10 21:09               ` Brian Inglis
2017-06-09 15:56           ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).