From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 125038 invoked by alias); 9 Aug 2017 08:38:09 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 123948 invoked by uid 89); 9 Aug 2017 08:38:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,CYGWIN_OWNER_BODY,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=krebs, Krebs, Roger, substance X-HELO: mout.gmx.net Received: from mout.gmx.net (HELO mout.gmx.net) (212.227.15.19) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Aug 2017 08:38:06 +0000 Received: from ODTOSH2015 ([84.161.254.106]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MFMIO-1dtiVK1nEK-00EPLc; Wed, 09 Aug 2017 10:38:02 +0200 From: "Jannick" To: "'Roger Krebs'" , References: <004401d3109c$2dcb09e0$89611da0$@gmx.net> <598a47fc.5501ca0a.5476f.0305@mx.google.com> <004701d310a9$372363e0$a56a2ba0$@gmx.net> In-Reply-To: Subject: RE: gawk 4.1.4: CR separate char for CRLF files Date: Wed, 09 Aug 2017 08:38:00 -0000 Message-ID: <001001d310ea$ceeee230$6ccca690$@gmx.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-UI-Out-Filterresults: notjunk:1;V01:K0:KzYcuW8f3yU=:8HlY2Q/Tbrc+ad1wk9sXaa ToqAQynImXfBpr8TPExuzZkWvQNVUIvIryLFzQ1Sxw0IK4ZdWbX/YN0u6WEl7iylLH8uLJgpe Uw8S/wsu95AKfQH8XaqldNq6rhks7PV6kEJIJDAkT0L94gwGlYHgpIIWwHs9GU/fWQAReexPq 3SGE1m64IH/T5VhrSch9/d/6GJaYRRKJo1lDEYxIvZOMqhAFGiaCD39pjLmpRWOwd08ygkdXO dUhfifsHe3rg/roddTw4r5Ccse4vvGPswEp1+G1R6nacoAw2XY+YeEQRbr9sRmKj+a+NkQ4ja nJsoj8M51yb0EQ4HYV0pr3DPPgH0OjPfu9Rn9hBqy8pPlReTZUc1gTepOsQMs+jFI3lsiwLb9 Acxru/ARLpMPtZYgkrBPuQV0eNtRTVV7HRCZ4AWIj+IfMYBP8VxTn2MBcLqilBzpw6Qif3N15 8+OuNgNrVLZlwtE93dqK3xrVfbkGe/cwo+TkXQAa6Ofnafm9RMvh3O63fCLU885wqCsblHiQ5 icNxZqbF5C3EqZCYHNWXlgT51wUeFfpoA7YwZWRgqp5QzGdq8gimlswq0ROz8q5IrLz6LAPK3 jZvgcu4giCGG/DJLXLyy4GiExZ6sgliXCIgPhZDD4LkpQmADM3MuRcV6QvioF4AM99p33aoWS AuKZ4QCi925OTQbxy7J1I6BUqLqvwWB4zvdYUMNTMJ4cdqoWj2Mm2npBStDJzgVppwAhxrB8L N8evC9nJv7Tgs9sr6+QCsYR6aSqO+sv1fRb9gkzWQ4sHfv12NGgtFtxcJ0gmHBXasAF+n/BhQ UPeKjPEI6YZQW1kSiInPV758+9zBBNn53AFNwYrXZocWKv77bk= X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00095.txt.bz2 Hi Roger, On Wed, 9 Aug 2017 07:03:24 +0000, Roger Krebs wrote: > I've added a BEGIN section at the beginning awk sript file setting the record > separator explicitly for the input file (RS) as well as for the output file (ORS): >=20 > BEGIN { > RS=3D"\r\n" > ORS=3D"\r\n" > } > { > ... your script > } >=20 > Especially the RS parameter wasn't necessary in the past but now it is. Which is a pretty much of a pain when there is no easy fallback solution provided in case a major change is applied. E.g. for sed - if I understand the reference to sed in https://cygwin.com/ml/cygwin/2017-08/msg00033.html correctly - a separate switch '-b' is added. For the latest gawk version I cannot see anything like that which means that all of our awk scripts run against cygwin's gawk do break without any tweak unless I am missing anything here.=20 This is - to say the least - unpleasant in the light of what Cygwin claims to be, namely 'a large collection of GNU and Open Source tools which provide functionality similar to a Linux distribution on Windows' (from the top of the start website www.cygwin.com). Again, admittedly I did not dive into the discussion and the substance of the reasoning to make this move to gawk | sed | grep. Now I can see the following *easy* solutions to the very situation here (input only for now): 1 - Inserting the BEGIN section as you suggested into more than 1k scripts (not feasible due to additional regression test workload)=20 2 - Calling 'gawk -vRS=3D\r\n -vORS=3D\r\n' instead of 'gawk' (hack to turn= back the additional the latest gawk's complexity, wrapper needed) 3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed again) 4 - Using another compiled version of gawk which does *not* disable the out-of-the-box gawk feature to swallow CRs (cf., e.g., http://git.savannah.gnu.org/cgit/gawk.git/tree/awkgram.y#n3543), i.e. without the artificial obstacle to now know the EOL type of the input file ahead of running gawk. > It works in all my cases. The only disadvantage: you have to know what kind ... plus the disadvantage to systematically amend all the scripts instead of having an external solution=20 > of files you want to handle in the awk script. The same awk script will not > work for DOS files as well as for linux files. ... another issue originated by the change and which didn=92t exist before. > Best >=20 > Roger Please don't get me wrong, but this raises a real issue here and I am not sure which rationale other than 'let's get more of the Linux-feel' drove the decision. All the best, J.=20 > -----Urspr=FCngliche Nachricht----- > Von: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com] Im > Auftrag von Jannick > Gesendet: Mittwoch, 9. August 2017 02:48 > An: cygwin@cygwin.com > Betreff: RE: gawk 4.1.4: CR separate char for CRLF files >=20 > On Tue, 08 Aug 2017 16:23:40 -0700 (PDT), Steven Penny wrote: > > On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote: > > > the current version 4.1.4 of gawk appears to unpleasantly treat CR > > > for CRLF files, i.e. CR is not gracefully swallowed, but is a > > > separate > character. > > > > > > This makes some, if not all, of the scripts we are working with here > > > useless, unless the input files are converted to LF which certainly > > > is not feasible. IIRC the issue did not show up some versions back. > > > > > > Is this a bug - or am I missing something here? > > > > Learn to read: > > > > http://cygwin.com/ml/cygwin/2017-08/msg00033.html >=20 > Thanks - quickly done. >=20 > The link reveals that CRLF/LF conversion is now mandatory to work with > cygwin's gawk on DOS machines. As far as I can see there is no legacy > solution like for, e.g., sed (-b switch) to have an easy solution for the issue, > especially when invoking gawk from makefiles (piping). >=20 > I consider this bad news while admittedly not fully understanding the whole > background of the move which is not necessary for now. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple