public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* gawk Regression: CR characters are not stripped on Windows
@ 2018-02-27  7:22 Orgad Shaneh
  2018-02-27  9:50 ` Andrey Repin
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Orgad Shaneh @ 2018-02-27  7:22 UTC (permalink / raw)
  To: cygwin, bug-gawk, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

Hi,

Cross-posting per Eli Zaretskii's request.

CR characters used to be automatically stripped on Windows (MSYS2 and
Cygwin environments). This is broken in 4.2.0.

Minimal example:
echo -en "foo\r\n\r\nbar\r\n" > foo.txt
awk '/^$/ { print "found" }' foo.txt # This worked with 4.1.4 and
doesn't work with 4.2.0
awk '/^\r$/ { print "found" }' foo.txt # This works with 4.2.0 and
doesn't work with 4.1.4

Bisected to commit 5db38f775d9ba239e125d81dff2010a2ddacb48e:
(* gawkmisc.c (cygwin_premain0, cygwin_premain2): Remove.
No longer needed).

Apparently it's still needed...

This issue was reported in https://github.com/git-for-windows/git/issues/1524

Proposed patch is attached.

As Eli said, this change was deliberate. But this has several drawbacks.

1. The gawk info page states that:

> Under MS-Windows, 'gawk' (and many other text programs) silently
> translates end-of-line '\r\n' to '\n' on input and '\n' to '\r\n' on
> output.

and on Feb 8 the following section was added:

> Recent versions of Cygwin open all files in binary mode.  This means
> that you should use 'RS = "\r?\n"' in order to be able to handle
> standard MS-Windows text files with carriage-return plus line-feed line
> endings.

This breaks compatibility between different gawk versions. What were
the reasons for this change in cygwin, and why was it pushed upstream?

2. Git and other tools automatically convert text files to CRLF on
Windows. This means that any awk script that runs on both platforms
must use RS = "\r?\n". One example that was broken by this behavior
change is gerrit's commit-msg hook[1], which scans for empty lines by
/^$/ regexp.

Please consider reverting this change. Patch attached.

[1] https://gerrit.googlesource.com/gerrit/+/376a7bbb64f1b3f13c261f4efa0af0e8538cfe9b/resources/com/google/gerrit/server/tools/root/hooks/commit-msg#101

- Orgad

[-- Attachment #2: 0001-Revert-default-mode-on-Cygwin-from-binary-back-to-te.patch --]
[-- Type: application/octet-stream, Size: 4649 bytes --]

[-- Attachment #3: Type: text/plain, Size: 219 bytes --]


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-06  4:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27  7:22 gawk Regression: CR characters are not stripped on Windows Orgad Shaneh
2018-02-27  9:50 ` Andrey Repin
2018-02-27 10:13   ` Orgad Shaneh
2018-02-27 12:55     ` Steven Penny
2018-02-27 11:09 ` Houder
2018-02-27 15:03 ` Brian Inglis
2018-02-27 16:56 ` Eric Blake
2018-03-05 13:36 ` [bug-gawk] " arnold
2018-03-05 14:00   ` Corinna Vinschen
2018-03-05 14:23     ` arnold
2018-03-05 14:43       ` arnold
2018-03-05 21:54       ` Andrey Repin
2018-03-06  0:33         ` Vince Rice
2018-03-06  4:42         ` arnold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).