From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 116675 invoked by alias); 27 Feb 2018 16:56:54 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 116387 invoked by uid 89); 27 Feb 2018 16:56:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=entitled, coordinated, executive X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 27 Feb 2018 16:56:52 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 265E98D744; Tue, 27 Feb 2018 16:56:51 +0000 (UTC) Received: from [10.10.122.122] (ovpn-122-122.rdu2.redhat.com [10.10.122.122]) by smtp.corp.redhat.com (Postfix) with ESMTP id CDE432024CA2; Tue, 27 Feb 2018 16:56:50 +0000 (UTC) Subject: Re: gawk Regression: CR characters are not stripped on Windows To: cygwin@cygwin.com, bug-gawk@gnu.org, Eli Zaretskii References: From: Eric Blake Message-ID: <619440c1-0480-41a8-ddc0-216b31f3efd9@redhat.com> Date: Tue, 27 Feb 2018 16:56:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-02/txt/msg00300.txt.bz2 [urrgh - Cygwin's list policy in supplying reply-to makes it difficult to reply-to-all] On 02/27/2018 01:22 AM, Orgad Shaneh wrote: > Hi, > > Cross-posting per Eli Zaretskii's request. > > CR characters used to be automatically stripped on Windows (MSYS2 and > Cygwin environments). This is broken in 4.2.0. You should not think of Cygwin as a Windows environment, but as a Linux-alike environment. gawk on Linux does not automatically strip CRs, therefore gawk on Cygwin should not automatically strip CRs. What MSYS2 does is different, and that environment is entitled to use patches to make interoperability with native windows program nicer, at the expense of being less like Linux. Furthermore, the change in Cygwin predates the gawk 4.2.0 release, and was intentionally made in a coordinated release in Feb 2017 alongside sed and grep: https://sourceware.org/ml/cygwin/2017-02/msg00152.html https://sourceware.org/ml/cygwin/2017-02/msg00188.html https://sourceware.org/ml/cygwin/2017-02/msg00189.html following on from discussions about bash after ShellShock: https://sourceware.org/ml/cygwin/2016-08/msg00097.html Changing gawk back to automatically strip CRs on Cygwin would be a regression. > As Eli said, this change was deliberate. But this has several drawbacks. > > 1. The gawk info page states that: > >> Under MS-Windows, 'gawk' (and many other text programs) silently >> translates end-of-line '\r\n' to '\n' on input and '\n' to '\r\n' on >> output. > > and on Feb 8 the following section was added: > >> Recent versions of Cygwin open all files in binary mode. This means >> that you should use 'RS = "\r?\n"' in order to be able to handle >> standard MS-Windows text files with carriage-return plus line-feed line >> endings. Or mount your Windows text files under a text mount in Cygwin (so that such files already have \r stripped), or add steps to your pipelines to strip CR before handing the data to gawk. > > This breaks compatibility between different gawk versions. What were > the reasons for this change in cygwin, and why was it pushed upstream? See the discussion in Feb 2017 for rationale, but the executive summary is that Cygwin attempts to emulate Linux, silent corruption of binary files was deemed worse than manually having to explicitly strip CR when dealing with Windows text output. > > 2. Git and other tools automatically convert text files to CRLF on > Windows. Not Cygwin git. The problems you are encountering are more likely to happen when you mix and match tools from disparate environments, rather than when you use all tools from the same source. > This means that any awk script that runs on both platforms > must use RS = "\r?\n". or strip the CR in any other means. But the same is true of any script that must run on both Windows and Linux. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple