public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* 2.10.0: Cygwin now can not work well with a file in dos format.
@ 2018-06-15 16:19 tuyanyi
  2018-06-15 19:46 ` Soegtrop, Michael
  2018-06-15 22:50 ` cyg Simple
  0 siblings, 2 replies; 10+ messages in thread
From: tuyanyi @ 2018-06-15 16:19 UTC (permalink / raw)
  To: cygwin

Hi,
I've faced a problem when deal with a text file now, the old version work very well, but the new version, it wont.

1, There is a file named test.txt which in dos format(CRLF), and the content is as follows,
456467987564654654
456467987564654654
456467987564654654
456467987564654655
456467987564654656
456467987564654657
456467987564654658
456467987564654659
456467987564654660
456467987564654661
456467987564654662
456467987564654663
456467987564654664
456467987564654665

I've written a awk script named stat.awk to count each line of test.txt.
#!/bin/awk -f
BEGIN {
	list[""] = ""
	delete list
}
{
	list[$0]++
}
END {
	for(l in list)
	{
		print l ":" list[l]
	}
}

In cygwin 2.874, the output is correct.
$ ./stat.awk test.txt
456467987564654659:1
456467987564654660:1
456467987564654661:1
456467987564654662:1
456467987564654663:1
456467987564654654:3
456467987564654664:1
456467987564654655:1
456467987564654665:1
456467987564654656:1
456467987564654657:1
456467987564654658:1

But, in version 2.10.0, the output is as follows, it's not what I want.
$ ./stat.awk test.txt
:16467987564654663
:16467987564654658
:16467987564654664
:16467987564654659
:36467987564654654
456467987564654665:1
:16467987564654660
:16467987564654655
:16467987564654661
:16467987564654656
:16467987564654662
:16467987564654657

Finally, I run dos2unix to convert this file into unix format, and then run stat.awk, and this time the output is correct.
$ ./stat.awk test.txt
456467987564654659:1
456467987564654660:1
456467987564654661:1
456467987564654662:1
456467987564654663:1
456467987564654654:3
456467987564654664:1
456467987564654655:1
456467987564654665:1
456467987564654656:1
456467987564654657:1
456467987564654658:1

I was wonder, what have changed between version 2.10.0 and 2.874,  or is it possible to modify a configure to fix this in 2.10.0?

By the way, my OS is Window 7 64bit, and I installed a 64bit Cygwin with version 2.10.0 .

Thank you very much!

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-15 16:19 2.10.0: Cygwin now can not work well with a file in dos format tuyanyi
@ 2018-06-15 19:46 ` Soegtrop, Michael
  2018-06-15 23:50   ` cyg Simple
                     ` (2 more replies)
  2018-06-15 22:50 ` cyg Simple
  1 sibling, 3 replies; 10+ messages in thread
From: Soegtrop, Michael @ 2018-06-15 19:46 UTC (permalink / raw)
  To: tuyanyi, cygwin

Dear Tuyanyi,

what has been changed in sed and awk is handling of carriage returns. The sed and awk of older Cygwin version strip \r from the input. Newer versions behave like the same tools on Linux and don't strip CR. This is documented in the release notes, intended behavior and has been discussed quite extensively on the list (I complained about the same issue some-time last year).

The options you have is either to strip the \r characters away first (e.g. using tr) or to compile old versions of awk and/or sed from sources.

Best regards,

Michael
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-15 16:19 2.10.0: Cygwin now can not work well with a file in dos format tuyanyi
  2018-06-15 19:46 ` Soegtrop, Michael
@ 2018-06-15 22:50 ` cyg Simple
  1 sibling, 0 replies; 10+ messages in thread
From: cyg Simple @ 2018-06-15 22:50 UTC (permalink / raw)
  To: cygwin

On 6/15/2018 10:39 AM, tuyanyi wrote:
> Hi,
> I've faced a problem when deal with a text file now, the old version work very well, but the new version, it wont.
> 

See the following for the solution:
https://www.gnu.org/software/gawk/manual/gawk.html#Cygwin

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-15 19:46 ` Soegtrop, Michael
@ 2018-06-15 23:50   ` cyg Simple
  2018-06-16  0:49   ` Brian Inglis
  2018-06-16 14:10   ` Michel LaBarre
  2 siblings, 0 replies; 10+ messages in thread
From: cyg Simple @ 2018-06-15 23:50 UTC (permalink / raw)
  To: cygwin



On 6/15/2018 11:11 AM, Soegtrop, Michael wrote:
> Dear Tuyanyi,
> 
> what has been changed in sed and awk is handling of carriage returns. The sed and awk of older Cygwin version strip \r from the input. Newer versions behave like the same tools on Linux and don't strip CR. This is documented in the release notes, intended behavior and has been discussed quite extensively on the list (I complained about the same issue some-time last year).
> 
> The options you have is either to strip the \r characters away first (e.g. using tr) or to compile old versions of awk and/or sed from sources.
> 

The best option is to follow the suggestions in the manual.
https://www.gnu.org/software/gawk/manual/gawk.html#Cygwin

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-15 19:46 ` Soegtrop, Michael
  2018-06-15 23:50   ` cyg Simple
@ 2018-06-16  0:49   ` Brian Inglis
  2018-06-16 14:10   ` Michel LaBarre
  2 siblings, 0 replies; 10+ messages in thread
From: Brian Inglis @ 2018-06-16  0:49 UTC (permalink / raw)
  To: cygwin

On 2018-06-15 09:11, Soegtrop, Michael wrote:
> what has been changed in sed and awk is handling of carriage returns. The sed
> and awk of older Cygwin version strip \r from the input. Newer versions 
> behave like the same tools on Linux and don't strip CR. This is documented
> in the release notes, intended behavior and has been discussed quite
> extensively on the list (I complained about the same issue some-time last
> year).> The options you have is either to strip the \r characters away first (e.g.
> using tr) or to compile old versions of awk and/or sed from sources.
Use sed commands:

	$ sed -e 's/\r$//' ...		# strip input \r
	$ sed ... -e 's/$/\r/' ...	# insert output \r

or awk options:

	$ awk -v RS='\r?\n'		# strip input \r
	$ awk -v ORS='\r\n'		# insert output \r

to work the same on any system.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-15 19:46 ` Soegtrop, Michael
  2018-06-15 23:50   ` cyg Simple
  2018-06-16  0:49   ` Brian Inglis
@ 2018-06-16 14:10   ` Michel LaBarre
  2018-06-16 15:21     ` Marco Atzeri
  2018-06-16 20:25     ` Soegtrop, Michael
  2 siblings, 2 replies; 10+ messages in thread
From: Michel LaBarre @ 2018-06-16 14:10 UTC (permalink / raw)
  To: 'Soegtrop, Michael', 'tuyanyi', 'cygwin'



> -----Original Message-----
> From: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com] On
> Behalf Of Soegtrop, Michael
> Sent: June 15, 2018 11:11 AM
> To: tuyanyi; cygwin
> Subject: RE: 2.10.0: Cygwin now can not work well with a file in dos format.
> 
> Dear Tuyanyi,
> 
> what has been changed in sed and awk is handling of carriage returns. The sed
> and awk of older Cygwin version strip \r from the input. Newer versions behave
> like the same tools on Linux and don't strip CR. This is documented in the
> release notes, intended behavior and has been discussed quite extensively on
> the list (I complained about the same issue some-time last year).

[Michel LaBrre:] 
I also find the decision unfortunate as I am one of the many riff-raff who use Cygwin to supplement windows
and have no need for strict POSIX compliance but then I get what I pay for :-)  In any event I have a few
questions:

  1. Where in the release notes is this mentioned so that I can try to ensure that I find out about future changes?
       Searching for Cygwin release notes gets me to https://cygwin.com/cygwin-ug-net/ov-new.html
       In which I have found no (or could not recognise) mention of no longer stripping CR. 
       Are there more detailed release notes somewhere else?

  2. Various "solutions" have been noted for gawk in related emails - all require minor but pervasive code changes.  
      Are there any similar solutions for the other tools besides scattering "tr" all over the place?
      Is everything affected from sort to grep to join?  This could impact the use of such tools with multiple files
      some of which come from Win32 tools and others from Cygwin tools.  I may be wrong about the
      pervasiveness of the impact since, as I said, nothing was apparent in the release notes that I found.

      Also, I don't recall seeing related discussions in this mail-list.  Would these have taken place in
      another Cygwin-developer-focused mail-list?

  3. Is there any chance of Cygwin providing a pervasive file behaviour control switch for all the affected tools
     that have been used generally for text rather than binary data handling?

  4. Would MSYS be better for those of us who are trying to supplement Windows rather than running Linux
       on Windows?

Thanks,
/Michel
      
> 
> The options you have is either to strip the \r characters away first (e.g. using tr)
> or to compile old versions of awk and/or sed from sources.
> 
> Best regards,
> 
> Michael
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Christin Eisenschmid, Christian Lamprechter
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
> 
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-16 14:10   ` Michel LaBarre
@ 2018-06-16 15:21     ` Marco Atzeri
  2018-06-16 20:25     ` Soegtrop, Michael
  1 sibling, 0 replies; 10+ messages in thread
From: Marco Atzeri @ 2018-06-16 15:21 UTC (permalink / raw)
  To: cygwin

On 6/16/2018 3:06 PM, Michel LaBarre wrote:
> 

> 
> [Michel LaBrre:]
> I also find the decision unfortunate as I am one of the many riff-raff who use Cygwin to supplement windows
> and have no need for strict POSIX compliance but then I get what I pay for :-)  In any event I have a few
> questions:
> 
>    1. Where in the release notes is this mentioned so that I can try to ensure that I find out about future changes?
>         Searching for Cygwin release notes gets me to https://cygwin.com/cygwin-ug-net/ov-new.html
>         In which I have found no (or could not recognise) mention of no longer stripping CR.
>         Are there more detailed release notes somewhere else?

https://sourceware.org/ml/cygwin-announce/2017-02/msg00020.html for sed

> 
>    2. Various "solutions" have been noted for gawk in related emails - all require minor but pervasive code changes.
>        Are there any similar solutions for the other tools besides scattering "tr" all over the place?
>        Is everything affected from sort to grep to join?  This could impact the use of such tools with multiple files
>        some of which come from Win32 tools and others from Cygwin tools.  I may be wrong about the
>        pervasiveness of the impact since, as I said, nothing was apparent in the release notes that I found.
> 
>        Also, I don't recall seeing related discussions in this mail-list.  Would these have taken place in
>        another Cygwin-developer-focused mail-list?

no. here

my solution is to use d2u and u2d of dos2unix package
for moving files between the two formats.


>    3. Is there any chance of Cygwin providing a pervasive file behaviour control switch for all the affected tools
>       that have been used generally for text rather than binary data handling?

LF and CRLF format files are both text but in different environments

> 
>    4. Would MSYS be better for those of us who are trying to supplement Windows rather than running Linux
>         on Windows?

it depends on your preference.

> 
> Thanks,
> /Michel
>        

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-16 14:10   ` Michel LaBarre
  2018-06-16 15:21     ` Marco Atzeri
@ 2018-06-16 20:25     ` Soegtrop, Michael
  2018-06-16 20:34       ` Michel LaBarre
  2018-06-17  1:13       ` cyg Simple
  1 sibling, 2 replies; 10+ messages in thread
From: Soegtrop, Michael @ 2018-06-16 20:25 UTC (permalink / raw)
  To: Michel LaBarre, 'cygwin'

Dear Michel,

>   1. Where in the release notes is this mentioned so that I can try to ensure that I
> find out about future changes?

As far as I can tell such things are announced here on this mailing list with subject "[ANNOUNCEMENT]". This specific change was announced here:

https://sourceware.org/ml/cygwin/2017-02/msg00152.html

You can search the mailing list for ANNOUCEMENT SED here:

https://cygwin.com/ml/cygwin/

>   2. Various "solutions" have been noted for gawk in related emails - all require
> minor but pervasive code changes.
>       Are there any similar solutions for the other tools besides scattering "tr" all
> over the place?

I haven't tried it but according to the release note above this depends on the mount type. It might help to mount your Windows text files in a mount with the text flag set. Binary files should still work on such mounts - as far as I can tell this mount flag has the effect that the binary / text flag to e.g. fopen makes a difference. If you have success with this, I would be interested to learn about it.

>       Also, I don't recall seeing related discussions in this mail-list.  Would these
> have taken place in
>       another Cygwin-developer-focused mail-list?

It has been discussed extensively in February and June last year on this list.

>   3. Is there any chance of Cygwin providing a pervasive file behaviour control
> switch for all the affected tools
>      that have been used generally for text rather than binary data handling?

As stated above, the intended mechanism seems to be to give this hint in the mount tables. You might also want to read through the lengthy discussion on the topic in June last year.

>   4. Would MSYS be better for those of us who are trying to supplement
> Windows rather than running Linux
>        on Windows?

I am building complex Linux centric projects for MinGW on Cygwin. Although this is the intended purpose of MSYS2, for me Cygwin works better. Most of the configure and makefiles I have to handle didn't go very far with MSYS2, while with Cygwin I needed only very minor patches here and there. Maybe things changed meanwhile - I tested this about 3 years ago and am happy with Cygwin since then. But I can say that I run CI tests with the latest Cygwin version daily and the SED change was the only Cygwin change which broke my builds in the last 3 years. It could definitely be substantially worse. So while one can discuss some decisions of the Cygwin team, it appears to me that they took the right decisions.

Best regards

Michael

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-16 20:25     ` Soegtrop, Michael
@ 2018-06-16 20:34       ` Michel LaBarre
  2018-06-17  1:13       ` cyg Simple
  1 sibling, 0 replies; 10+ messages in thread
From: Michel LaBarre @ 2018-06-16 20:34 UTC (permalink / raw)
  To: 'Soegtrop, Michael', 'cygwin'

Thank you for the thoughtful responses Michael and Marco.

I am sorting through the references from both of you while trying to keep in mind
all the caveats regarding mount mode, file-path syntax ( /d/zot vs d:\zot ), and whether
any given utility is "line" oriented or not to infer how it might or might not hardwire the open mode.
(BTW, Section 3.2.1 of the user guide may need an update wrt sed ...)

I don't question the decisions - just trying to understand the implications before I next update everything.

The issue of text vs binary open must regularly boost alcohol sales.

(Note to self:  do not skip messages with [ANNOUNCEMENT] in the subject.)

Thanks
/Michel



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.10.0: Cygwin now can not work well with a file in dos format.
  2018-06-16 20:25     ` Soegtrop, Michael
  2018-06-16 20:34       ` Michel LaBarre
@ 2018-06-17  1:13       ` cyg Simple
  1 sibling, 0 replies; 10+ messages in thread
From: cyg Simple @ 2018-06-17  1:13 UTC (permalink / raw)
  To: cygwin

On 6/16/2018 10:10 AM, Soegtrop, Michael wrote:
> 
> I haven't tried it but according to the release note above this depends on the mount type. It might help to mount your Windows text files in a mount with the text flag set. Binary files should still work on such mounts - as far as I can tell this mount flag has the effect that the binary / text flag to e.g. fopen makes a difference. If you have success with this, I would be interested to learn about it.
> 

This only affects files on disk.  Pipes are still binary mode.  So
depending on how the data is received by awk, sed, etc you may still
receive CRLF instead of the CR removed.  If the application opens the
file directly then it should work to use the text mount option.  I would
caution though that this causes a need-to-know issue and can cause
headaches if the one using the data doesn't know.  I highly suggest not
to use this option and to use the appropriate filters to handle CRLF.
NOTE: This issue exists for all files on Linux as well as you don't know
when you'll have a file with CRLF that needs to be processed.  You're
better off filtering all text files with appropriate tools before
processing them.  Only when ending the file with .txt extension should
the file contain CRLF before sending it back to a user just because of
the brain dead MS Notepad that opens the file in binary mode and expects
the file to contain the control characters to move the cursor left and down.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-06-16 20:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-15 16:19 2.10.0: Cygwin now can not work well with a file in dos format tuyanyi
2018-06-15 19:46 ` Soegtrop, Michael
2018-06-15 23:50   ` cyg Simple
2018-06-16  0:49   ` Brian Inglis
2018-06-16 14:10   ` Michel LaBarre
2018-06-16 15:21     ` Marco Atzeri
2018-06-16 20:25     ` Soegtrop, Michael
2018-06-16 20:34       ` Michel LaBarre
2018-06-17  1:13       ` cyg Simple
2018-06-15 22:50 ` cyg Simple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).