public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Re: gawk 4.1.4: CR separate char for CRLF files
@ 2017-08-16 12:19 Vermessung AVT - Wolfgang Rieger
  2017-08-16 12:27 ` Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files] David Macek
  0 siblings, 1 reply; 30+ messages in thread
From: Vermessung AVT - Wolfgang Rieger @ 2017-08-16 12:19 UTC (permalink / raw)
  To: cygwin

Achim Gratz wrote:
Vermessung AVT - Wolfgang Rieger writes:
> Another solution which we have been using for many years now, though 
> it might not be feasible for you:

----------------------------- snip --------------------------------

Jannick, another idea I had thought of previously might eventually help:

There is the possibility in awk to include source code by @include "myfile.awk" syntax. I was sometimes thinking of providing a general awk script that could deal with oddities of any kind that could easily be changed just in myfile.awk when necessary, e.g. due to updates. You could even think of an optional environment variable to control which script to include. It should be easy to add such an @include line in all gawk scripts automatically. Did you thing of something like that?

Kind regards,
Wolfgang


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files]
  2017-08-16 12:19 gawk 4.1.4: CR separate char for CRLF files Vermessung AVT - Wolfgang Rieger
@ 2017-08-16 12:27 ` David Macek
  2017-08-16 22:50   ` Steven Penny
  0 siblings, 1 reply; 30+ messages in thread
From: David Macek @ 2017-08-16 12:27 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 176 bytes --]

Please stop breaking the message threads, it's hard to comprehend what's happening this way.  If needed, I can help with configuring your e-mail client.

-- 
David Macek


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3715 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files]
  2017-08-16 12:27 ` Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files] David Macek
@ 2017-08-16 22:50   ` Steven Penny
  2017-08-17 12:30     ` cyg Simple
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Penny @ 2017-08-16 22:50 UTC (permalink / raw)
  To: cygwin

On Wed, 16 Aug 2017 14:27:37, David Macek wrote:
> Please stop breaking the message threads, it's hard to comprehend what's ha=
> ppening this way.  If needed, I can help with configuring your e-mail clien=
> t.

You really dont need to be giving people advice about their email client. Glass
houses and all that:

http://cygwin.com/ml/cygwin/2017-08/msg00116.html


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files]
  2017-08-16 22:50   ` Steven Penny
@ 2017-08-17 12:30     ` cyg Simple
  2017-08-17 22:54       ` Steven Penny
  0 siblings, 1 reply; 30+ messages in thread
From: cyg Simple @ 2017-08-17 12:30 UTC (permalink / raw)
  To: cygwin

On 8/16/2017 6:49 PM, Steven Penny wrote:
> On Wed, 16 Aug 2017 14:27:37, David Macek wrote:
>> Please stop breaking the message threads, it's hard to comprehend
>> what's ha=
>> ppening this way.  If needed, I can help with configuring your e-mail
>> clien=
>> t.
> 
> You really dont need to be giving people advice about their email
> client. Glass
> houses and all that:
> 
> http://cygwin.com/ml/cygwin/2017-08/msg00116.html
> 

LOL, at least David's advice wasn't about the client itself.  The fact
that flowed quoted printable is still a valid means of email record is
all and good and if your client cannot manage it then you're the one who
needs to change the client.  David is correct the OP needs to stop
breaking threads and using list etiquette.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files]
  2017-08-17 12:30     ` cyg Simple
@ 2017-08-17 22:54       ` Steven Penny
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Penny @ 2017-08-17 22:54 UTC (permalink / raw)
  To: cygwin

On Thu, 17 Aug 2017 08:30:02, cyg Simple wrote:
> LOL, at least David's advice wasn't about the client itself.  The fact
> that flowed quoted printable is still a valid means of email record is
> all and good and if your client cannot manage it then you're the one who
> needs to change the client.

My "client", is an Awk script that I wrote:

http://github.com/svnpenn/tryst

which I only use for sending and replying to messages, and it works quite well.
For reading messages, I use the archives:

http://cygwin.com/ml/cygwin

So no, his method is not valid, given that he is now knowingly ignoring the fact
that the archives require hard wrapping.

> David is correct the OP needs to stop breaking threads and using list
> etiquette.

That same etiquette would dictate that he implement hard wrapping with his
client, so that others dont have to deal with him (custom CSS via Stylish). As
long as he cannot or will not do this, he doesnt really have a place to
criticise, correct, or even comment on others’ clients.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
@ 2017-08-16 12:50 cyg Simple
  0 siblings, 0 replies; 30+ messages in thread
From: cyg Simple @ 2017-08-16 12:50 UTC (permalink / raw)
  To: cygwin

Vermessung AVT - Wolfgang Rieger writes:

> 5) You can always find a better way to do things, of course, I won't
> argue about that. Sometimes we thought about switching to Java or php
> or python or whatever. Maybe, we should. But we have a lot of running
> scripts, massive batch and parallel processing, and cmd.exe with
> minimum Cygwin (no X subsystem, no pile of tools, just a tiny
> installation) has worked great for many years - so why not use it?
> Just because it is not intended to use it that way?

Just because it is not intended to use it that way, yes, that is the
reason not to do it.  Just because it works now doesn't mean that it
will continue to work and you put yourself in jeopardy if you ever
update your software.  With your use of cmd.exe instead of a Cygwin
shell also puts you at risk of not being able to execute your scripts.
While Cygwin doesn't intentionally cause its binaries to not execute
outside of Cygwin support for those binaries is only supported if the
problems exist within the Cygwin shell as well.  So if an executable
provides expected results in bash but not in cmd, you lose.

-- 
cyg Simple

P.S.: You need to learn how to use a proper mail client and respond to
this list appropriately.  I had to "edit as new" and hand edit the mail
just to get proper quoting.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-16 12:09 Vermessung AVT - Wolfgang Rieger
@ 2017-08-16 12:26 ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2017-08-16 12:26 UTC (permalink / raw)
  To: cygwin; +Cc: Vermessung AVT - Wolfgang Rieger


[-- Attachment #1.1: Type: text/plain, Size: 1905 bytes --]

On 08/16/2017 07:09 AM, Vermessung AVT - Wolfgang Rieger wrote:
> Achim Gratz wrote:
> Vermessung AVT - Wolfgang Rieger writes:
>> Another solution which we have been using for many years now, though 
>> it might not be feasible for you:
> Cygwin is, like it or not, a rolling distribution.

Your quoting is horrible; you repeated Achim's comments without adding
any '>' nesting,


> Regards,
> Achim.
> -- 
> +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Then used a '-- ' line, which sane mail clients treat as the end of the
email and start of the signature, and only then added your content.
Thunderbird, in particular, refuses to include signatures by default
when replying to a message; and hitting 'ctrl-a' then reply to at least
paste all of your text then loses the formatting of your message, making
it very difficult to reply to you, as shown here:

> SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
> http://Synth.Stromeko.net/Downloads.html#KorgSDada Dear Achim, I fully
> agree to most of what you say. But: 1) As well as Cygwin is a rolling
> distrib my work is a "rolling work". And that is why I deal with it in

At any rate, in answer to your question:

> 
> Anyway, thanks for the suggestion to contact the upstream developers. I was not aware of that. Can you give me a hint where to go?

awk --help | tail -n10

points you to the manual for how to report upstream bugs; if you don't
like info, the same data can be found here:

https://www.gnu.org/software/gawk/manual/html_node/Bugs.html#Bugs

(in general, ANY good program will include instructions for how to reach
upstream in its --help output - of course, not all programs are at the
same level of goodness in this regards)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
@ 2017-08-16 12:09 Vermessung AVT - Wolfgang Rieger
  2017-08-16 12:26 ` Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Vermessung AVT - Wolfgang Rieger @ 2017-08-16 12:09 UTC (permalink / raw)
  To: cygwin

Achim Gratz wrote:
Vermessung AVT - Wolfgang Rieger writes:
> Another solution which we have been using for many years now, though 
> it might not be feasible for you:

Cygwin is, like it or not, a rolling distribution.

> We very rarely update Cygwin. We have been using Cygwin for some 15+ 
> years now. We use tools like gawk (hundreds of scripts), head, tail, 
> sort, etc. that we are using in shell scripts running under cmd.exe 
> (no Unix shells involved). I soon realized that upgrades of Cygwin may 
> cause troubles with existing scripts, so we only update if we really 
> need to (e.g.: New functionality that would be important, 32 to 64 bit 
> shift, eventually new Windows versions, bugs we needed to be fixed).

Hopefully the machine(s) runnning those scripts are isolated.

In your particular case you might be better off using MSys2 or GNUwin32 tools, although you'd still need a better way to deal with updates.
Also, audit your scripts for non-portable constructs, since those are the parts that most likely to break.  CMD scripting is a tough nut to crack if it's of any complexity and there are lots of things that are poorly or not officially documented.  I don't quite understand why you use POSIX tools, but specifically shun POSIX scripting.

> I have followed the discussions about the CR/LF behaviour changes in 
> the past attentively and decided not to update in near future, because 
> that would lead to a massive problem with many hundreds of scripts - 
> hoping that sometimes there will be a change in gawk again.

You'd better replace that hope with a feature request at gawk upstream.

> What is Unix-like or OS-like or Posix-like behaviour in that context?
> You could argue that gawk interprets line endings like the underlying 
> OS does (i. e., gawk reads LF in Unix and CR/LF in Win), or it 
> interprets line endings in a Unix-style no matter of the underlying OS 
> used. That's a developer's decision in my opinion.

Cygwin uses LF line endings (yes there are still text mounts, but you'd be better off pretending they don't exist).  When you're trying to use it for CRLF files, you need to wrap those invocations to do an explicit conversion.

https://cygwin.com/cygwin-ug-net/using-textbinary.html

> But since with pipes or output redirection gawk used to write no CRs 
> even in previous versions, we already had the problem that gawk had to 
> accept *both* inputs, LF with or without CR. That worked widely fine 
> so far, since most Windows and other application SW we use accept both 
> record formats, fortunately (we had issues with SW upgrades of other 
> vendors no longer accepting pure LF, but that only concerned a very 
> small number of scripts). With the new approach in Cygwin that seems 
> to be broken, so we did not upgrade Cygwin since then (we currently 
> use gawk 4.1.3).

Again, your attempt to freeze your system at some arbitrary point in time is misguided.  It'll never quite work out and chances are that when it breaks it will do so in ways that creates more work and forces you to do it in emergency mode, which is never a good thing.

> Of course the reason for that really annoying CR/LF thing is the 
> arrogance and ignorance of MS, which caused innumerable of useless 
> developers' hours when I think of the endless discussions and changes 
> in Cygwin; but MS is the one who defines the standards because of its 
> very market power, so we have to deal with it, if we like or not.

You really can't blame them for CRLF, they weren't and aren't the only ones using it and it's been in use long before Microsoft entered the scene.

> I'd definitely prefer to use Unix for its powerful tools, but most of 
> the SW we use is simply not available for Unix, and MS does not 
> provide gawk etc. So we have to deal with that CR/LF issue in a 
> pragmatic rather than in a more, say, philosophical approach: We need 
> to run our scripts with as little changes as possible. So that's why 
> we upgrade Cygwin as seldom as possible. It is a "living system", yes, 
> which is great on the one side - but can be annoying in everyday 
> practice.

Again, you'd better figure out how to transform your input (and possibly
output) so it'll conform to the conventions of the tool(s) you use, perhaps by providing a handful of wrapper scripts.  Alternatively, only use tools that adhere to the same set of conventions.

> In my opinion there should be at least an option for gawk to accept 
> both LF and CR/LF line endings equally, preferably with a system 
> variable so that there is no need to change the command line call of 
> gawk at all. That's what I vote for.

Yes, but please cast that vote with the upstream developers.  I reckon it'd be a generally useful function, so there's no point in providing it only on Cygwin.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada




Dear Achim,

I fully agree to most of what you say. But:

1) As well as Cygwin is a rolling distrib my work is a "rolling work". And that is why I deal with it in what I call a pragmatic way: I need a working system with minimum maintaining effort. SW is as it is provided, and I have to adopt, since I mostly can't write my own.

2) When I started using Cygwin some 2 decades ago I was coming from Unix. C programming and awk were what I was used to. In fact, awk was my most favourite tool, even for developing small C-programs. When I was forced to switch to Windows I needed a way to do text and data processing in a feasible manner and to port several awk-scripts to Windows within short time - awk is a nearly perfect text processing tool till today, though not widely known. I don't know anything comparable in terms of ease of syntax (once you know C), compactness of code, flexibility, and, most important for me, I am very familiar with it. Somebody recommended to use Cygwin then, which I implemented and learned to work with and to like it. Decisions had to be made about scripting then, and for some reasons we ended up with cmd.exe and a couple of additional tools along with some major software. It was not at all ideal, but it was very easy and very flexible. Had we known to how the system once would grow, maybe we would have decided differently. Maybe. But I am not sure if we would have come so far. We are a service provider with the need to automate tools, we are not a software vendor.

3) Years later somebody recommended GnuWin as native port, which I immediately tried. However, we ran into serious problems with quoting, as single quote syntax did not work there (Unix and Cygwin: gawk '{print}' would have to be written as gawk "{print}"), which broke a lot of scripts, and there were other problems with providing special characters, quoting, etc. which I could not manage to solve. So we did not switch (and, besides, sometimes I was not sure if GnuWin was still an active system - Cygwin has great user groups and is very active).

4) We have learned a lot of how to incorporate Cygwin in cmd.exe, even with constructs like
for /f "usebackq ..." %%A in (`someprog ... ^| gawk '{...}' ^| something`) do ...
and a lot of other and even more complicated things. That may sound strange, but it works and has worked for many, many years now. A lot is possible!

5) You can always find a better way to do things, of course, I won't argue about that. Sometimes we thought about switching to Java or php or python or whatever. Maybe, we should. But we have a lot of running scripts, massive batch and parallel processing, and cmd.exe with minimum Cygwin (no X subsystem, no pile of tools, just a tiny installation) has worked great for many years - so why not use it? Just because it is not intended to use it that way?

6) We have a grown and growing system. To completely change the system would certainly mean months of work on developer's side. But we have no developer team. We work on projects which have to go on. We do programming where it is necessary in order to automate processes. Our clients don't care if we change software, they want results.

7) Yes, many things could be done much better. I'd like to have the perfect system. But there is no perfect system. Cygwin under cmd.exe works really fine once you have learned its specifics. In fact, Cygwin has done a really great job in our environment for nearly 2 decades so far, even if we mostly don't use it as intended!

There is one point where I disagree. You said,
> Again, you'd better figure out how to transform your input (and possibly
> output) so it'll conform to the conventions of the tool(s) you use, perhaps
> by providing a handful of wrapper scripts.  Alternatively, only use tools that
> adhere to the same set of conventions.
That is exactly what we do and have done so far as I explained above. The problem comes up when developers decide for any reasons to change the behaviour - which happened with the CRLF handling. You can argue that the previous CRLF handling of gawk was not posix conforming. To be honest, I never looked up posix specifications. I use the SW by trying how it works and adopt to it. A SW vendor may be forced to check for compatibility considerations before writing one single line of code (I doubt many of them do so). But I am not a SW vendor. I eventually take the gawk manual and write code and test it. I realise there is that CRLF thing and adopt my scripts accordingly. For many years that worked; the developers did not change the behaviour. Our "input and output perfectly conformed to the conventions" (which means for me, what the SW accepted). Some day they changed the conventions. The reasons are comprehensible, of course, yet it causes a big amount of troubles. That is where we are now. So I "adhere to the same set of conventions" by simply not updating now.

Maybe in 10 years time another developer group decides to change it another way for any other good reason. Every change in syntax will cause problems. If a SW tools allows several ways it can be assumed all of them will be used by different people. If that behaviour is changed it *will* cause problems for some.


Anyway, thanks for the suggestion to contact the upstream developers. I was not aware of that. Can you give me a hint where to go?

Kind regards,
Wolfgang


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-14 10:36 Vermessung AVT - Wolfgang Rieger
  2017-08-15 15:41 ` Jannick
@ 2017-08-15 16:43 ` Achim Gratz
  1 sibling, 0 replies; 30+ messages in thread
From: Achim Gratz @ 2017-08-15 16:43 UTC (permalink / raw)
  To: cygwin

Vermessung AVT - Wolfgang Rieger writes:
> Another solution which we have been using for many years now, though
> it might not be feasible for you:

Cygwin is, like it or not, a rolling distribution.

> We very rarely update Cygwin. We have been using Cygwin for some 15+
> years now. We use tools like gawk (hundreds of scripts), head, tail,
> sort, etc. that we are using in shell scripts running under cmd.exe
> (no Unix shells involved). I soon realized that upgrades of Cygwin may
> cause troubles with existing scripts, so we only update if we really
> need to (e.g.: New functionality that would be important, 32 to 64 bit
> shift, eventually new Windows versions, bugs we needed to be fixed).

Hopefully the machine(s) runnning those scripts are isolated.

In your particular case you might be better off using MSys2 or GNUwin32
tools, although you'd still need a better way to deal with updates.
Also, audit your scripts for non-portable constructs, since those are
the parts that most likely to break.  CMD scripting is a tough nut to
crack if it's of any complexity and there are lots of things that are
poorly or not officially documented.  I don't quite understand why you
use POSIX tools, but specifically shun POSIX scripting.

> I have followed the discussions about the CR/LF behaviour changes in
> the past attentively and decided not to update in near future, because
> that would lead to a massive problem with many hundreds of scripts -
> hoping that sometimes there will be a change in gawk again.

You'd better replace that hope with a feature request at gawk upstream.

> What is Unix-like or OS-like or Posix-like behaviour in that context?
> You could argue that gawk interprets line endings like the underlying
> OS does (i. e., gawk reads LF in Unix and CR/LF in Win), or it
> interprets line endings in a Unix-style no matter of the underlying OS
> used. That's a developer's decision in my opinion.

Cygwin uses LF line endings (yes there are still text mounts, but you'd
be better off pretending they don't exist).  When you're trying to use
it for CRLF files, you need to wrap those invocations to do an explicit
conversion.

https://cygwin.com/cygwin-ug-net/using-textbinary.html

> But since with pipes or output redirection gawk used to write no CRs
> even in previous versions, we already had the problem that gawk had to
> accept *both* inputs, LF with or without CR. That worked widely fine
> so far, since most Windows and other application SW we use accept both
> record formats, fortunately (we had issues with SW upgrades of other
> vendors no longer accepting pure LF, but that only concerned a very
> small number of scripts). With the new approach in Cygwin that seems
> to be broken, so we did not upgrade Cygwin since then (we currently
> use gawk 4.1.3).

Again, your attempt to freeze your system at some arbitrary point in
time is misguided.  It'll never quite work out and chances are that when
it breaks it will do so in ways that creates more work and forces you to
do it in emergency mode, which is never a good thing.

> Of course the reason for that really annoying CR/LF thing is the
> arrogance and ignorance of MS, which caused innumerable of useless
> developers' hours when I think of the endless discussions and changes
> in Cygwin; but MS is the one who defines the standards because of its
> very market power, so we have to deal with it, if we like or not.

You really can't blame them for CRLF, they weren't and aren't the only
ones using it and it's been in use long before Microsoft entered the
scene.

> I'd definitely prefer to use Unix for its powerful tools, but most of
> the SW we use is simply not available for Unix, and MS does not
> provide gawk etc. So we have to deal with that CR/LF issue in a
> pragmatic rather than in a more, say, philosophical approach: We need
> to run our scripts with as little changes as possible. So that's why
> we upgrade Cygwin as seldom as possible. It is a "living system", yes,
> which is great on the one side - but can be annoying in everyday
> practice.

Again, you'd better figure out how to transform your input (and possibly
output) so it'll conform to the conventions of the tool(s) you use,
perhaps by providing a handful of wrapper scripts.  Alternatively, only
use tools that adhere to the same set of conventions.

> In my opinion there should be at least an option for gawk to accept
> both LF and CR/LF line endings equally, preferably with a system
> variable so that there is no need to change the command line call of
> gawk at all. That's what I vote for.

Yes, but please cast that vote with the upstream developers.  I reckon
it'd be a generally useful function, so there's no point in providing it
only on Cygwin.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: gawk 4.1.4: CR separate char for CRLF files
  2017-08-14 10:36 Vermessung AVT - Wolfgang Rieger
@ 2017-08-15 15:41 ` Jannick
  2017-08-15 16:43 ` Achim Gratz
  1 sibling, 0 replies; 30+ messages in thread
From: Jannick @ 2017-08-15 15:41 UTC (permalink / raw)
  To: cygwin; +Cc: 'Vermessung AVT - Wolfgang Rieger'

Hi Wolfgang,

First of all, many thanks for your interesting experience report and the
constructive 
remarks.

On Mon, 14 Aug 2017 10:36:23 +0000, Vermessung AVT - Wolfgang Rieger wrote:
> Another solution which we have been using for many years now, though it
> might not be feasible for you:

Yes, you are right, unfortunately: We make extensive use of gawk extensions
to  
upgraded with gawk in tandem. Thus we will move forward with the ongoing 
gawk development.

> We very rarely update Cygwin. We have been using Cygwin for some 15+
> years now. We use tools like gawk (hundreds of scripts), head, tail, sort,
etc.
> that we are using in shell scripts running under cmd.exe (no Unix shells
> involved). I soon realized that upgrades of Cygwin may cause troubles with
> existing scripts, so we only update if we really need to (e.g.: New
> functionality that would be important, 32 to 64 bit shift, eventually new
> Windows versions, bugs we needed to be fixed).
> 
> I have followed the discussions about the CR/LF behaviour changes in the
> past attentively and decided not to update in near future, because that
> would lead to a massive problem with many hundreds of scripts - hoping
> that sometimes there will be a change in gawk again.

Agree - this is the same setting here. Furthermore, we run our heavy
processes 
on a semi-annual basis within a more than tight time frame. So cygwin's
update
came pretty much out of the blue in the late minute, because since the last
reporting cycle we have not used gawk. An unpleasant surprise with heavy 
potential time issues if we had not taken the decision on how to deal with
the 
changed situation. And as you are saying below ...

> What is Unix-like or OS-like or Posix-like behaviour in that context? You
could
> argue that gawk interprets line endings like the underlying OS does (i.
e.,
> gawk reads LF in Unix and CR/LF in Win), or it interprets line endings in
a
> Unix-style no matter of the underlying OS used. That's a developer's
decision
> in my opinion.

True. And the developers of gawk opted - with a heavy heart I believe - to 
have gawk swallow CRs.

> But since with pipes or output redirection gawk used to write no CRs even
in
> previous versions, we already had the problem that gawk had to accept
> *both* inputs, LF with or without CR. That worked widely fine so far,
since
> most Windows and other application SW we use accept both record formats,
> fortunately (we had issues with SW upgrades of other vendors no longer
> accepting pure LF, but that only concerned a very small number of
scripts).
> With the new approach in Cygwin that seems to be broken, so we did not
> upgrade Cygwin since then (we currently use gawk 4.1.3).

Yes, this is our basis of SW selection process as well, but we march with
gawk's
version as it nicely develops needing a gawk version reading files and pipes

of any LF and CRLF kind out of the box.

> Of course the reason for that really annoying CR/LF thing is the arrogance
> and ignorance of MS, which caused innumerable of useless developers'
> hours when I think of the endless discussions and changes in Cygwin; but
MS
> is the one who defines the standards because of its very market power, so
> we have to deal with it, if we like or not. I'd definitely prefer to use
Unix for
> its powerful tools, but most of the SW we use is simply not available for
Unix,
> and MS does not provide gawk etc. So we have to deal with that CR/LF issue
> in a pragmatic rather than in a more, say, philosophical approach: We need
> to run our scripts with as little changes as possible. So that's why we
upgrade
> Cygwin as seldom as possible. It is a "living system", yes, which is great
on
> the one side - but can be annoying in everyday practice.

We are squared into the Windows world as well. So there's no way out of
that.
So far I was more than happy that the gawk code comes with the feature to
silently 
swallow CRs (cf. the code reference with the exact code line in my previous 
posting) and that was used until the last update. Now that things - from our

point of view - tremendously changed, we were urged to run a decision
process
looking at alternatives (I listed in my first email). The evaluation in the
past
days led us to the decision to use another source of bilingual versions of
gawk 
and friends (i.e. they read CRLF and CR without any additional hint). This
is what
the user can opt for.

> In my opinion there should be at least an option for gawk to accept both
LF
> and CR/LF line endings equally, preferably with a system variable so that
> there is no need to change the command line call of gawk at all. That's
what I
> vote for.

Fully agree - for this I would have been pretty much in favor as well.
Something
close to this I was having in mind in my first posting.
  
> Kind regards,
> Wolfgang

Best regards,
J.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: gawk 4.1.4: CR separate char for CRLF files
@ 2017-08-14 10:36 Vermessung AVT - Wolfgang Rieger
  2017-08-15 15:41 ` Jannick
  2017-08-15 16:43 ` Achim Gratz
  0 siblings, 2 replies; 30+ messages in thread
From: Vermessung AVT - Wolfgang Rieger @ 2017-08-14 10:36 UTC (permalink / raw)
  To: cygwin

On Wed, 9 Aug 2017 10:38 +0000, Jannick wrote:

--- snip ---
> Now I can see the following *easy* solutions to the very situation here (input only for now):
>
> 1 - Inserting the BEGIN section as you suggested into more than 1k scripts (not feasible due to additional regression test workload) 
>
> 2 - Calling 'gawk -vRS=\r\n -vORS=\r\n' instead of 'gawk' (hack to turn back the additional the latest gawk's complexity, wrapper needed)
>
> 3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed again)
>
> 4 - Using another compiled version of gawk which does *not* disable the out-of-the-box gawk feature to swallow CRs (cf., e.g., http://git.savannah.gnu.org/cgit/gawk.git/tree/awkgram.y#n3543), i.e.
> without the artificial obstacle to now know the EOL type of the input file ahead of running gawk.
>
>> It works in all my cases. The only disadvantage: you have to know what kind
>
>... plus the disadvantage to systematically amend all the scripts instead of having an external solution 
>
>> of files you want to handle in the awk script. The same awk script 
>> will not
>> work for DOS files as well as for linux files.
>
>... another issue originated by the change and which didn't exist before.
>
>> Best
>> 
>> Roger
>
> Please don't get me wrong, but this raises a real issue here and I am not sure which rationale other than 'let's get more of the Linux-feel' drove the decision.
>
> All the best,
> J. 
--- snip ---

Another solution which we have been using for many years now, though it might not be feasible for you:

We very rarely update Cygwin. We have been using Cygwin for some 15+ years now. We use tools like gawk (hundreds of scripts), head, tail, sort, etc. that we are using in shell scripts running under cmd.exe (no Unix shells involved). I soon realized that upgrades of Cygwin may cause troubles with existing scripts, so we only update if we really need to (e.g.: New functionality that would be important, 32 to 64 bit shift, eventually new Windows versions, bugs we needed to be fixed).

I have followed the discussions about the CR/LF behaviour changes in the past attentively and decided not to update in near future, because that would lead to a massive problem with many hundreds of scripts - hoping that sometimes there will be a change in gawk again.

What is Unix-like or OS-like or Posix-like behaviour in that context? You could argue that gawk interprets line endings like the underlying OS does (i. e., gawk reads LF in Unix and CR/LF in Win), or it interprets line endings in a Unix-style no matter of the underlying OS used. That's a developer's decision in my opinion.

But since with pipes or output redirection gawk used to write no CRs even in previous versions, we already had the problem that gawk had to accept *both* inputs, LF with or without CR. That worked widely fine so far, since most Windows and other application SW we use accept both record formats, fortunately (we had issues with SW upgrades of other vendors no longer accepting pure LF, but that only concerned a very small number of scripts). With the new approach in Cygwin that seems to be broken, so we did not upgrade Cygwin since then (we currently use gawk 4.1.3).

Of course the reason for that really annoying CR/LF thing is the arrogance and ignorance of MS, which caused innumerable of useless developers' hours when I think of the endless discussions and changes in Cygwin; but MS is the one who defines the standards because of its very market power, so we have to deal with it, if we like or not. I'd definitely prefer to use Unix for its powerful tools, but most of the SW we use is simply not available for Unix, and MS does not provide gawk etc. So we have to deal with that CR/LF issue in a pragmatic rather than in a more, say, philosophical approach: We need to run our scripts with as little changes as possible. So that's why we upgrade Cygwin as seldom as possible. It is a "living system", yes, which is great on the one side - but can be annoying in everyday practice.

In my opinion there should be at least an option for gawk to accept both LF and CR/LF line endings equally, preferably with a system variable so that there is no need to change the command line call of gawk at all. That's what I vote for.

Kind regards,
Wolfgang





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-11 16:54                             ` Brian Inglis
@ 2017-08-11 17:06                               ` cyg Simple
  0 siblings, 0 replies; 30+ messages in thread
From: cyg Simple @ 2017-08-11 17:06 UTC (permalink / raw)
  To: cygwin

On 8/11/2017 12:54 PM, Brian Inglis wrote:
> On 2017-08-11 06:47, cyg Simple wrote:
>> On 8/10/2017 6:49 PM, Brian Inglis wrote:
>>> On 2017-08-10 15:49, cyg Simple wrote:
>>>> On 8/10/2017 5:34 PM, Brian Inglis wrote:
>>>>>>
>>>>>> http://cygwin.com/ml/cygwin/2017-08/msg00104.html
>>>>>
>>>>> It is flowed format with quoted breaks, which I see reassembled and wrapped in
>>>>> the window by Thunderbird with no issues:
>>>
>>>> So what setting do I have that is causing me to not see it.  Every mail
>>>> David sends displays as empty for me.
>>>
>>> Enable/set wrap settings in config editor, and in Tools/Options/Composition
>>> tab/Send Options... button check Text format Send message as plain text if
>>> possible checkbox/select Convert the message to plain text dropdown; add
>>> cygwin.com and sourceware.org in Plain text domains tab.
> 
>> That is for sending mail, not reading it.  What causes you to be able to
>> read David's mail and not me?
> 
> First part:
>>> Enable/set wrap settings in config editor,
> search for wrap and set toggles to true and values to 80/72/...
> 

Great, thanks for that.  I can now read David's mail.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-11 12:47                           ` cyg Simple
@ 2017-08-11 16:54                             ` Brian Inglis
  2017-08-11 17:06                               ` cyg Simple
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Inglis @ 2017-08-11 16:54 UTC (permalink / raw)
  To: cygwin

On 2017-08-11 06:47, cyg Simple wrote:
> On 8/10/2017 6:49 PM, Brian Inglis wrote:
>> On 2017-08-10 15:49, cyg Simple wrote:
>>> On 8/10/2017 5:34 PM, Brian Inglis wrote:
>>>>>
>>>>> http://cygwin.com/ml/cygwin/2017-08/msg00104.html
>>>>
>>>> It is flowed format with quoted breaks, which I see reassembled and wrapped in
>>>> the window by Thunderbird with no issues:
>>
>>> So what setting do I have that is causing me to not see it.  Every mail
>>> David sends displays as empty for me.
>>
>> Enable/set wrap settings in config editor, and in Tools/Options/Composition
>> tab/Send Options... button check Text format Send message as plain text if
>> possible checkbox/select Convert the message to plain text dropdown; add
>> cygwin.com and sourceware.org in Plain text domains tab.

> That is for sending mail, not reading it.  What causes you to be able to
> read David's mail and not me?

First part:
>> Enable/set wrap settings in config editor,
search for wrap and set toggles to true and values to 80/72/...

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 22:49                         ` Brian Inglis
@ 2017-08-11 12:47                           ` cyg Simple
  2017-08-11 16:54                             ` Brian Inglis
  0 siblings, 1 reply; 30+ messages in thread
From: cyg Simple @ 2017-08-11 12:47 UTC (permalink / raw)
  To: cygwin

On 8/10/2017 6:49 PM, Brian Inglis wrote:
> On 2017-08-10 15:49, cyg Simple wrote:
>> On 8/10/2017 5:34 PM, Brian Inglis wrote:
>>>>
>>>> http://cygwin.com/ml/cygwin/2017-08/msg00104.html
>>>
>>> It is flowed format with quoted breaks, which I see reassembled and wrapped in
>>> the window by Thunderbird with no issues:
> 
>> So what setting do I have that is causing me to not see it.  Every mail
>> David sends displays as empty for me.
> 
> Enable/set wrap settings in config editor, and in Tools/Options/Composition
> tab/Send Options... button check Text format Send message as plain text if
> possible checkbox/select Convert the message to plain text dropdown; add
> cygwin.com and sourceware.org in Plain text domains tab.
> 

That is for sending mail, not reading it.  What causes you to be able to
read David's mail and not me?

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 22:49                         ` Brian Inglis
@ 2017-08-10 23:59                           ` Steven Penny
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Penny @ 2017-08-10 23:59 UTC (permalink / raw)
  To: cygwin

On Thu, 10 Aug 2017 16:48:47, Brian Inglis wrote:
> Many archives and sites display lines off the right margin instead of allowing
> them to wrap as normal in HTML. Possibly using pre format style without
> horizontal scrollbars instead of just specifying a monospace font style. That
> makes it a site or converter design issue!

Nope. Wrong. David has been doing this for over 2 years:

http://cygwin.com/ml/cygwin/2015-01/msg00232.html

So it is a user issue. The user must hard wrap because Cygwin site does not.
When he knowingly disregards this he does it to the detriment of all users of
the archives.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 22:22                       ` Steven Penny
@ 2017-08-10 22:49                         ` Brian Inglis
  2017-08-10 23:59                           ` Steven Penny
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Inglis @ 2017-08-10 22:49 UTC (permalink / raw)
  To: cygwin

On 2017-08-10 16:22, Steven Penny wrote:
> On Thu, 10 Aug 2017 15:34:11, Brian Inglis wrote:
>> It is flowed format with quoted breaks, which I see reassembled and wrapped in
>> the window by Thunderbird with no issues:
> 
> Thats great, but it doesnt do that with Firefox, and it doesnt do that with
> Internet Explorer. So for people reading the mailing list via the archives
> (read: me), each line just scrolls off the page until OP decides to break for a
> paragraph.

Many archives and sites display lines off the right margin instead of allowing
them to wrap as normal in HTML. Possibly using pre format style without
horizontal scrollbars instead of just specifying a monospace font style. That
makes it a site or converter design issue!

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 21:49                       ` cyg Simple
@ 2017-08-10 22:49                         ` Brian Inglis
  2017-08-11 12:47                           ` cyg Simple
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Inglis @ 2017-08-10 22:49 UTC (permalink / raw)
  To: cygwin

On 2017-08-10 15:49, cyg Simple wrote:
> On 8/10/2017 5:34 PM, Brian Inglis wrote:
>>>
>>> http://cygwin.com/ml/cygwin/2017-08/msg00104.html
>>
>> It is flowed format with quoted breaks, which I see reassembled and wrapped in
>> the window by Thunderbird with no issues:

> So what setting do I have that is causing me to not see it.  Every mail
> David sends displays as empty for me.

Enable/set wrap settings in config editor, and in Tools/Options/Composition
tab/Send Options... button check Text format Send message as plain text if
possible checkbox/select Convert the message to plain text dropdown; add
cygwin.com and sourceware.org in Plain text domains tab.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 21:34                     ` Brian Inglis
  2017-08-10 21:49                       ` cyg Simple
@ 2017-08-10 22:22                       ` Steven Penny
  2017-08-10 22:49                         ` Brian Inglis
  1 sibling, 1 reply; 30+ messages in thread
From: Steven Penny @ 2017-08-10 22:22 UTC (permalink / raw)
  To: cygwin

On Thu, 10 Aug 2017 15:34:11, Brian Inglis wrote:
> It is flowed format with quoted breaks, which I see reassembled and wrapped in
> the window by Thunderbird with no issues:

Thats great, but it doesnt do that with Firefox, and it doesnt do that with
Internet Explorer. So for people reading the mailing list via the archives
(read: me), each line just scrolls off the page until OP decides to break for a
paragraph.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 21:34                     ` Brian Inglis
@ 2017-08-10 21:49                       ` cyg Simple
  2017-08-10 22:49                         ` Brian Inglis
  2017-08-10 22:22                       ` Steven Penny
  1 sibling, 1 reply; 30+ messages in thread
From: cyg Simple @ 2017-08-10 21:49 UTC (permalink / raw)
  To: cygwin

On 8/10/2017 5:34 PM, Brian Inglis wrote:
>>
>> http://cygwin.com/ml/cygwin/2017-08/msg00104.html
> 
> It is flowed format with quoted breaks, which I see reassembled and wrapped in
> the window by Thunderbird with no issues:
> 

So what setting do I have that is causing me to not see it.  Every mail
David sends displays as empty for me.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 18:35                   ` Steven Penny
@ 2017-08-10 21:34                     ` Brian Inglis
  2017-08-10 21:49                       ` cyg Simple
  2017-08-10 22:22                       ` Steven Penny
  0 siblings, 2 replies; 30+ messages in thread
From: Brian Inglis @ 2017-08-10 21:34 UTC (permalink / raw)
  To: cygwin

On 2017-08-10 12:35, Steven Penny wrote:
> On Thu, 10 Aug 2017 10:45:34, cyg Simple wrote:
>> David, I don't know what it is about your email that my thunderbird
>> client doesn't like but I can't read your email except from reviewing
>> the message source.
> 
> Hes using quoted-printable, but he is not actually breaking on 80, so it just
> comes out as one long line. Really annoying, and I usually wont even reads posts
> from people who do that. Here is an example:
> 
> http://cygwin.com/ml/cygwin/2017-08/msg00104.html

It is flowed format with quoted breaks, which I see reassembled and wrapped in
the window by Thunderbird with no issues:

> Content-Type: text/plain; charset=utf-8; format=flowed
> Content-Language: en-US
> Content-Transfer-Encoding: quoted-printable
...
> I feel the need to correct you slightly.  Although Linux is a good model, C=
> ygwin primarily strives to be a good *POSIX* platform, so there may be case=
> s where the two intentionally differ.

displays as:

I feel the need to correct you slightly.  Although Linux is a good model, Cygwin
primarily strives to be a good *POSIX* platform, so there may be cases where the
two intentionally differ.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 14:46                 ` cyg Simple
@ 2017-08-10 18:35                   ` Steven Penny
  2017-08-10 21:34                     ` Brian Inglis
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Penny @ 2017-08-10 18:35 UTC (permalink / raw)
  To: cygwin

On Thu, 10 Aug 2017 10:45:34, cyg Simple wrote:
> David, I don't know what it is about your email that my thunderbird
> client doesn't like but I can't read your email except from reviewing
> the message source.

Hes using quoted-printable, but he is not actually breaking on 80, so it just
comes out as one long line. Really annoying, and I usually wont even reads posts
from people who do that. Here is an example:

http://cygwin.com/ml/cygwin/2017-08/msg00104.html


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 12:31               ` David Macek
@ 2017-08-10 14:46                 ` cyg Simple
  2017-08-10 18:35                   ` Steven Penny
  0 siblings, 1 reply; 30+ messages in thread
From: cyg Simple @ 2017-08-10 14:46 UTC (permalink / raw)
  To: cygwin



On 8/10/2017 8:31 AM, David Macek wrote:

David, I don't know what it is about your email that my thunderbird
client doesn't like but I can't read your email except from reviewing
the message source.  Your assumption that Cygwin strives to be a good
*POSIX* platform also applies to Linux.  If you find a difference then
there is a discrepancy that should be documented or resolved.  However,
you should determine if you need to resolve the difference if you're on
differing platforms.  So my statement of "if it doesn't work on Linux
but does on Cygwin" still needs to be considered because of portability
issues.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-10 12:04             ` cyg Simple
@ 2017-08-10 12:31               ` David Macek
  2017-08-10 14:46                 ` cyg Simple
  0 siblings, 1 reply; 30+ messages in thread
From: David Macek @ 2017-08-10 12:31 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 735 bytes --]

On 10. 8. 2017 14:04, cyg Simple wrote:
> The clue here is, does it only work for this type of OS?  If yes then it
> isn't portable anyway but should it be?  And does it only work on this
> type of OS because of an issue that could change as a result of a fix.
> Cygwin has always been and will always be a work in progress.  The rule
> of thumb "does it work on Linux" should be applied to all that you do
> with Cygwin.  If it only works on Cygwin and not on Linux then the
> chances are, something will change.

I feel the need to correct you slightly.  Although Linux is a good model, Cygwin primarily strives to be a good *POSIX* platform, so there may be cases where the two intentionally differ.

-- 
David Macek


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3715 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-09 19:09           ` Eric Blake
@ 2017-08-10 12:04             ` cyg Simple
  2017-08-10 12:31               ` David Macek
  0 siblings, 1 reply; 30+ messages in thread
From: cyg Simple @ 2017-08-10 12:04 UTC (permalink / raw)
  To: cygwin



On 8/9/2017 3:09 PM, Eric Blake wrote:
> On 08/09/2017 06:03 AM, Eric Blake wrote:
>> On 08/09/2017 03:37 AM, Jannick wrote:
>>
>>> Which is a pretty much of a pain when there is no easy fallback solution
>>> provided in case a major change is applied.
> ...
>>> This is - to say the least - unpleasant in the light of what Cygwin claims
>>> to be, namely 'a large collection of GNU and Open Source tools which provide
>>> functionality similar to a Linux distribution on Windows' (from the top of
>>> the start website www.cygwin.com).
>>
>> On Linux, nothing strips CR automatically.  So on Cygwin, we behave the
>> same - nothing strips CR automatically on binary mounted data.
>>
>> And the fact that the change was made AND ANNOUNCED back in February,
>> but you are now only 6 months later complaining about it, is telling.
> 
> It was pointed out to me off-list that my reply can easily be mis-read
> in a much more negative tone than I intended, so I'm apologizing for
> coming across as mean (yes, I know, https://cygwin.com/acronyms/#WJM).
> I think I was trying to emphasize that complaints about the behavior
> change at the time of the change were expected (and there was indeed a
> reaction, although I was pleasantly surprised at the time that it was
> limited to just a few threads, so apparently not many people were
> negatively impacted - and that's a good thing).  But complaints about
> the behavior after six months are a bit unexpected.  But I guess not
> everyone keeps their software up-to-date on quite as frequent a
> schedule, so I shouldn't have been as surprised or reacted as harshly.
> 

I don't think you need to apologize, in fact your post stopped me from
posting similarly.

> At any rate, my advice continues to be the same: how would you deal with
> CRLF on a Linux system? That's the ideal way to also deal with it on
> Cygwin (we used to have gratuitous incompatibilities between the systems
> where the same command line on Linux did not have the same result as on
> Cygwin; but the change back in February was to get rid of those
> incompatibilities, even if it breaks scripts that were unwisely relying
> on the incompatibilities).
> 

The clue here is, does it only work for this type of OS?  If yes then it
isn't portable anyway but should it be?  And does it only work on this
type of OS because of an issue that could change as a result of a fix.
Cygwin has always been and will always be a work in progress.  The rule
of thumb "does it work on Linux" should be applied to all that you do
with Cygwin.  If it only works on Cygwin and not on Linux then the
chances are, something will change.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-09 11:03         ` Eric Blake
@ 2017-08-09 19:09           ` Eric Blake
  2017-08-10 12:04             ` cyg Simple
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2017-08-09 19:09 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 2157 bytes --]

On 08/09/2017 06:03 AM, Eric Blake wrote:
> On 08/09/2017 03:37 AM, Jannick wrote:
> 
>> Which is a pretty much of a pain when there is no easy fallback solution
>> provided in case a major change is applied.
...
>> This is - to say the least - unpleasant in the light of what Cygwin claims
>> to be, namely 'a large collection of GNU and Open Source tools which provide
>> functionality similar to a Linux distribution on Windows' (from the top of
>> the start website www.cygwin.com).
> 
> On Linux, nothing strips CR automatically.  So on Cygwin, we behave the
> same - nothing strips CR automatically on binary mounted data.
> 
> And the fact that the change was made AND ANNOUNCED back in February,
> but you are now only 6 months later complaining about it, is telling.

It was pointed out to me off-list that my reply can easily be mis-read
in a much more negative tone than I intended, so I'm apologizing for
coming across as mean (yes, I know, https://cygwin.com/acronyms/#WJM).
I think I was trying to emphasize that complaints about the behavior
change at the time of the change were expected (and there was indeed a
reaction, although I was pleasantly surprised at the time that it was
limited to just a few threads, so apparently not many people were
negatively impacted - and that's a good thing).  But complaints about
the behavior after six months are a bit unexpected.  But I guess not
everyone keeps their software up-to-date on quite as frequent a
schedule, so I shouldn't have been as surprised or reacted as harshly.

At any rate, my advice continues to be the same: how would you deal with
CRLF on a Linux system? That's the ideal way to also deal with it on
Cygwin (we used to have gratuitous incompatibilities between the systems
where the same command line on Linux did not have the same result as on
Cygwin; but the change back in February was to get rid of those
incompatibilities, even if it breaks scripts that were unwisely relying
on the incompatibilities).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-09  8:38       ` Jannick
@ 2017-08-09 11:03         ` Eric Blake
  2017-08-09 19:09           ` Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2017-08-09 11:03 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 1936 bytes --]

On 08/09/2017 03:37 AM, Jannick wrote:

> Which is a pretty much of a pain when there is no easy fallback solution
> provided in case a major change is applied. E.g. for sed - if I understand
> the reference to sed in https://cygwin.com/ml/cygwin/2017-08/msg00033.html
> correctly - a separate switch '-b' is added.

Incorrect. 'sed -b' has always existed, but did NOT do what you wanted
(it forced CR to be treated as a separate character; where what you want
is to ignore CR if it appears before LF).  In fact, the coordinated
change made back in February to all of grep, sed, and awk, was that all
three programs now default to what used to be possible only through 'sed
-b', because silently stripping CR can corrupt data when you are not
expecting it, while requiring the user to explicitly strip CR when they
know they are working with CRLF line endings is less magic (fewer
downstream patches, and more obvious in looking at a script that the
script knows what it is doing).

If your data lives on a text mount (instead of a binary mount), then you
still get CR stripping for free.  If your data comes from a pipeline
rather than the file system, then you can add a d2u or other
CR-stripping tool in the pipeline.


> This is - to say the least - unpleasant in the light of what Cygwin claims
> to be, namely 'a large collection of GNU and Open Source tools which provide
> functionality similar to a Linux distribution on Windows' (from the top of
> the start website www.cygwin.com).

On Linux, nothing strips CR automatically.  So on Cygwin, we behave the
same - nothing strips CR automatically on binary mounted data.

And the fact that the change was made AND ANNOUNCED back in February,
but you are now only 6 months later complaining about it, is telling.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: gawk 4.1.4: CR separate char for CRLF files
  2017-08-09  7:03     ` AW: " Roger Krebs
@ 2017-08-09  8:38       ` Jannick
  2017-08-09 11:03         ` Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Jannick @ 2017-08-09  8:38 UTC (permalink / raw)
  To: 'Roger Krebs', cygwin

Hi Roger,

On Wed, 9 Aug 2017 07:03:24 +0000, Roger Krebs wrote:
> I've added a BEGIN section at the beginning awk sript file setting the
record
> separator explicitly for the input file (RS) as well as for the output
file (ORS):
> 
> BEGIN {
>         RS="\r\n"
>         ORS="\r\n"
> }
> {
>    ... your script
> }
> 
> Especially the RS parameter wasn't necessary in the past but now it is.

Which is a pretty much of a pain when there is no easy fallback solution
provided in case a major change is applied. E.g. for sed - if I understand
the reference to sed in https://cygwin.com/ml/cygwin/2017-08/msg00033.html
correctly - a separate switch '-b' is added. For the latest gawk version I
cannot see anything like that which means that all of our awk scripts run
against cygwin's gawk do break without any tweak unless I am missing
anything here. 

This is - to say the least - unpleasant in the light of what Cygwin claims
to be, namely 'a large collection of GNU and Open Source tools which provide
functionality similar to a Linux distribution on Windows' (from the top of
the start website www.cygwin.com). Again, admittedly I did not dive into the
discussion and the substance of the reasoning to make this move to gawk |
sed | grep.

Now I can see the following *easy* solutions to the very situation here
(input only for now):

1 - Inserting the BEGIN section as you suggested into more than 1k scripts
(not feasible due to additional regression test workload) 

2 - Calling 'gawk -vRS=\r\n -vORS=\r\n' instead of 'gawk' (hack to turn back
the additional the latest gawk's complexity, wrapper needed)

3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed
again)

4 - Using another compiled version of gawk which does *not* disable the
out-of-the-box gawk feature to swallow CRs (cf., e.g.,
http://git.savannah.gnu.org/cgit/gawk.git/tree/awkgram.y#n3543), i.e.
without the artificial obstacle to now know the EOL type of the input file
ahead of running gawk.

> It works in all my cases. The only disadvantage: you have to know what
kind

... plus the disadvantage to systematically amend all the scripts instead of
having an external solution 

> of files you want to handle in the awk script. The same awk script will
not
> work for DOS files as well as for linux files.

... another issue originated by the change and which didn’t exist before.

> Best
> 
> Roger

Please don't get me wrong, but this raises a real issue here and I am not
sure which rationale other than 'let's get more of the Linux-feel' drove the
decision.

All the best,
J. 

> -----Ursprüngliche Nachricht-----
> Von: cygwin-owner@cygwin.com [mailto:cygwin-owner@cygwin.com] Im
> Auftrag von Jannick
> Gesendet: Mittwoch, 9. August 2017 02:48
> An: cygwin@cygwin.com
> Betreff: RE: gawk 4.1.4: CR separate char for CRLF files
> 
> On Tue, 08 Aug 2017 16:23:40 -0700 (PDT), Steven Penny wrote:
> > On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote:
> > > the current version 4.1.4 of gawk appears to unpleasantly treat CR
> > > for CRLF files, i.e. CR is not gracefully swallowed, but is a
> > > separate
> character.
> > >
> > > This makes some, if not all, of the scripts we are working with here
> > > useless, unless the input files are converted to LF which certainly
> > > is not feasible. IIRC the issue did not show up some versions back.
> > >
> > > Is this a bug - or am I missing something here?
> >
> > Learn to read:
> >
> > http://cygwin.com/ml/cygwin/2017-08/msg00033.html
> 
> Thanks - quickly done.
> 
> The link reveals that CRLF/LF conversion is now mandatory to work with
> cygwin's gawk on DOS machines. As far as I can see there is no legacy
> solution like for, e.g., sed (-b switch) to have an easy solution for the
issue,
> especially when invoking gawk from makefiles (piping).
> 
> I consider this bad news while admittedly not fully understanding the
whole
> background of the move which is not necessary for now.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: gawk 4.1.4: CR separate char for CRLF files
  2017-08-08 23:23 ` Steven Penny
@ 2017-08-09  0:49   ` Jannick
  2017-08-09  7:03     ` AW: " Roger Krebs
  0 siblings, 1 reply; 30+ messages in thread
From: Jannick @ 2017-08-09  0:49 UTC (permalink / raw)
  To: cygwin

On Tue, 08 Aug 2017 16:23:40 -0700 (PDT), Steven Penny wrote:
> On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote:
> > the current version 4.1.4 of gawk appears to unpleasantly treat CR for
> > CRLF files, i.e. CR is not gracefully swallowed, but is a separate
character.
> >
> > This makes some, if not all, of the scripts we are working with here
> > useless, unless the input files are converted to LF which certainly is
> > not feasible. IIRC the issue did not show up some versions back.
> >
> > Is this a bug - or am I missing something here?
> 
> Learn to read:
> 
> http://cygwin.com/ml/cygwin/2017-08/msg00033.html

Thanks - quickly done.

The link reveals that CRLF/LF conversion is now mandatory to work with
cygwin's gawk on DOS machines. As far as I can see there is no legacy
solution like for, e.g., sed (-b switch) to have an easy solution for the
issue, especially when invoking gawk from makefiles (piping). 

I consider this bad news while admittedly not fully understanding the whole
background of the move which is not necessary for now. 


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: gawk 4.1.4: CR separate char for CRLF files
  2017-08-08 23:16 Jannick
@ 2017-08-08 23:23 ` Steven Penny
  2017-08-09  0:49   ` Jannick
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Penny @ 2017-08-08 23:23 UTC (permalink / raw)
  To: cygwin

On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote:
> the current version 4.1.4 of gawk appears to unpleasantly treat CR for CRLF
> files, i.e. CR is not gracefully swallowed, but is a separate character.
> 
> This makes some, if not all, of the scripts we are working with here
> useless, unless the input files are converted to LF which certainly is not
> feasible. IIRC the issue did not show up some versions back. 
> 
> Is this a bug - or am I missing something here?

Learn to read:

http://cygwin.com/ml/cygwin/2017-08/msg00033.html


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

* gawk 4.1.4: CR separate char for CRLF files
@ 2017-08-08 23:16 Jannick
  2017-08-08 23:23 ` Steven Penny
  0 siblings, 1 reply; 30+ messages in thread
From: Jannick @ 2017-08-08 23:16 UTC (permalink / raw)
  To: cygwin

Dear All,

the current version 4.1.4 of gawk appears to unpleasantly treat CR for CRLF
files, i.e. CR is not gracefully swallowed, but is a separate character.

This makes some, if not all, of the scripts we are working with here
useless, unless the input files are converted to LF which certainly is not
feasible. IIRC the issue did not show up some versions back. 

Is this a bug - or am I missing something here?

Thanks,
J. - living on Win10



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-08-17 22:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-16 12:19 gawk 4.1.4: CR separate char for CRLF files Vermessung AVT - Wolfgang Rieger
2017-08-16 12:27 ` Mailing list threads [was: gawk 4.1.4: CR separate char for CRLF files] David Macek
2017-08-16 22:50   ` Steven Penny
2017-08-17 12:30     ` cyg Simple
2017-08-17 22:54       ` Steven Penny
  -- strict thread matches above, loose matches on Subject: below --
2017-08-16 12:50 gawk 4.1.4: CR separate char for CRLF files cyg Simple
2017-08-16 12:09 Vermessung AVT - Wolfgang Rieger
2017-08-16 12:26 ` Eric Blake
2017-08-14 10:36 Vermessung AVT - Wolfgang Rieger
2017-08-15 15:41 ` Jannick
2017-08-15 16:43 ` Achim Gratz
2017-08-08 23:16 Jannick
2017-08-08 23:23 ` Steven Penny
2017-08-09  0:49   ` Jannick
2017-08-09  7:03     ` AW: " Roger Krebs
2017-08-09  8:38       ` Jannick
2017-08-09 11:03         ` Eric Blake
2017-08-09 19:09           ` Eric Blake
2017-08-10 12:04             ` cyg Simple
2017-08-10 12:31               ` David Macek
2017-08-10 14:46                 ` cyg Simple
2017-08-10 18:35                   ` Steven Penny
2017-08-10 21:34                     ` Brian Inglis
2017-08-10 21:49                       ` cyg Simple
2017-08-10 22:49                         ` Brian Inglis
2017-08-11 12:47                           ` cyg Simple
2017-08-11 16:54                             ` Brian Inglis
2017-08-11 17:06                               ` cyg Simple
2017-08-10 22:22                       ` Steven Penny
2017-08-10 22:49                         ` Brian Inglis
2017-08-10 23:59                           ` Steven Penny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).