public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
@ 2011-05-18  6:27 Sven Severus
  2011-05-24  4:20 ` Reini Urban
  2011-05-25  6:17 ` Sven Severus
  0 siblings, 2 replies; 5+ messages in thread
From: Sven Severus @ 2011-05-18  6:27 UTC (permalink / raw)
  To: cygwin

Hello all,

let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll
1.7.9-1, full installation 2 weeks ago).

File foo.h is an ordinary text file, all lines are terminated with DOS
style line endings <cr> <lf> (hex: 0d 0a).
It is located in a directory with textmode mounting in cygwin.
One <cr> <lf> sequence of foo.h is split by a 4096 byte boundary within
the file: "od -c -Ax foo.h" shows a <cr> (='\r') at byte offset 4095
(0xfff)
and a <lf> (='\n') at offset 4096 (0x1000):
...
000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
001000  \n   /   /  \r  \n   /   /  \r  \n
001009

Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h"
to produce foomod.h, located in the same directory as foo.h, thus with
textmode mounting too.
When I examined the result, I noticed that foomod.h was one byte bigger
then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports:
...
000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
001000  \r  \n   /   /  \r  \n   /   /  \r  \n
00100a

Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!

In other files created by Perl with output redirection I observed this
behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
boundary (even multiple times in one output file). Line endings not
split by a 4096 byte boundary do not show this behaviour.

The behaviour does not occur, when the destination file is located
in a directory with binmode mounting. It does not occur either, when
I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"),
so I think the problem is specific to Cygwin Perl, not to Cygwin in
general.

I this a bug of the output buffering mechanism of Cygwin Perl?
Or do I anything wrong?
Any answer is highly appreciated. Thanks in advance.

Best regards
Sven

-- 
Mit freundlichen GrÌßen

Dipl. Inform. Sven Severus
Softwareentwicklung
----------------------------------------------------------
HIMA Paul Hildebrandt GmbH + CO KG
Abt: Entwicklung Software
Albert-Bassermann-Strasse 28
68782 Bruehl
Germany

Tel: +49 6202 709-289
Fax: +49 6202 709-299
E-Mail: s.severus@hima.com
Internet: www.hima.de


-- 
HIMA Paul Hildebrandt GmbH + Co KG, Albert-Bassermann-Str. 28, 68782 Bruehl bei Mannheim
Kommanditgesellschaft, Sitz Bruehl, Deutschland - Registergericht Mannheim HRA 421017
Ust-ID: DE 144286400, St.Nr: 43038 00190

Persoenlich haftende Gesellschafterin Paul Hildebrandt Verwaltungsgesellschaft mbH,
Sitz Bruehl, Deutschland - Registergericht Mannheim HRB 420588

Geschaeftsfuehrer: Dipl.-Betriebswirt Steffen Philipp


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
  2011-05-18  6:27 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting Sven Severus
@ 2011-05-24  4:20 ` Reini Urban
  2011-06-07 13:40   ` Reini Urban
  2011-05-25  6:17 ` Sven Severus
  1 sibling, 1 reply; 5+ messages in thread
From: Reini Urban @ 2011-05-24  4:20 UTC (permalink / raw)
  To: cygwin; +Cc: pp

2011/5/18 Sven Severus:
> let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll
> 1.7.9-1, full installation 2 weeks ago).
>
> File foo.h is an ordinary text file, all lines are terminated with DOS
> style line endings <cr> <lf> (hex: 0d 0a).
> It is located in a directory with textmode mounting in cygwin.
> One <cr> <lf> sequence of foo.h is split by a 4096 byte boundary within
> the file: "od -c -Ax foo.h" shows a <cr> (='\r') at byte offset 4095
> (0xfff)
> and a <lf> (='\n') at offset 4096 (0x1000):
> ...
> 000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
> 001000  \n   /   /  \r  \n   /   /  \r  \n
> 001009
>
> Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h"
> to produce foomod.h, located in the same directory as foo.h, thus with
> textmode mounting too.
> When I examined the result, I noticed that foomod.h was one byte bigger
> then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports:
> ...
> 000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
> 001000  \r  \n   /   /  \r  \n   /   /  \r  \n
> 00100a
>
> Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
> became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!
>
> In other files created by Perl with output redirection I observed this
> behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
> boundary (even multiple times in one output file). Line endings not
> split by a 4096 byte boundary do not show this behaviour.
>
> The behaviour does not occur, when the destination file is located
> in a directory with binmode mounting. It does not occur either, when
> I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"),
> so I think the problem is specific to Cygwin Perl, not to Cygwin in
> general.
>
> I this a bug of the output buffering mechanism of Cygwin Perl?
> Or do I anything wrong?
> Any answer is highly appreciated. Thanks in advance.

Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
The last char of the buffer is not stored when checking the first char
of the new buffer.
I think first we have to provide a sample test case to perl core.
-- 
Reini Urban
http://phpwiki.org/           http://murbreak.at/

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
  2011-05-18  6:27 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting Sven Severus
  2011-05-24  4:20 ` Reini Urban
@ 2011-05-25  6:17 ` Sven Severus
  1 sibling, 0 replies; 5+ messages in thread
From: Sven Severus @ 2011-05-25  6:17 UTC (permalink / raw)
  To: cygwin

Tue, 24 May 2011, Reini Urban wrote:
> Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
> The last char of the buffer is not stored when checking the first char
> of the new buffer.
> I think first we have to provide a sample test case to perl core.

Thank you for your response.
Sven

-- 
Mit freundlichen GrÌßen

Dipl. Inform. Sven Severus
Softwareentwicklung
----------------------------------------------------------
HIMA Paul Hildebrandt GmbH + CO KG
Abt: Entwicklung Software
Albert-Bassermann-Strasse 28
68782 Bruehl
Germany

Tel: +49 6202 709-289
Fax: +49 6202 709-299
E-Mail: s.severus@hima.com
Internet: www.hima.de


-- 
HIMA Paul Hildebrandt GmbH + Co KG, Albert-Bassermann-Str. 28, 68782 Bruehl bei Mannheim
Kommanditgesellschaft, Sitz Bruehl, Deutschland - Registergericht Mannheim HRA 421017
Ust-ID: DE 144286400, St.Nr: 43038 00190

Persoenlich haftende Gesellschafterin Paul Hildebrandt Verwaltungsgesellschaft mbH,
Sitz Bruehl, Deutschland - Registergericht Mannheim HRB 420588

Geschaeftsfuehrer: Dipl.-Betriebswirt Steffen Philipp


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
  2011-05-24  4:20 ` Reini Urban
@ 2011-06-07 13:40   ` Reini Urban
  2011-06-07 15:55     ` Craig A. Berry
  0 siblings, 1 reply; 5+ messages in thread
From: Reini Urban @ 2011-06-07 13:40 UTC (permalink / raw)
  To: cygwin; +Cc: pp

[-- Attachment #1: Type: text/plain, Size: 2509 bytes --]

2011/5/24 Reini Urban:
> 2011/5/18 Sven Severus:
>> let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll
>> 1.7.9-1, full installation 2 weeks ago).
>>
>> File foo.h is an ordinary text file, all lines are terminated with DOS
>> style line endings <cr> <lf> (hex: 0d 0a).
>> It is located in a directory with textmode mounting in cygwin.
>> One <cr> <lf> sequence of foo.h is split by a 4096 byte boundary within
>> the file: "od -c -Ax foo.h" shows a <cr> (='\r') at byte offset 4095
>> (0xfff)
>> and a <lf> (='\n') at offset 4096 (0x1000):
>> ...
>> 000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
>> 001000  \n   /   /  \r  \n   /   /  \r  \n
>> 001009
>>
>> Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h"
>> to produce foomod.h, located in the same directory as foo.h, thus with
>> textmode mounting too.
>> When I examined the result, I noticed that foomod.h was one byte bigger
>> then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports:
>> ...
>> 000ff0   /   /   /   /   /   /  \r  \n   /   /   X   X   X   X   X  \r
>> 001000  \r  \n   /   /  \r  \n   /   /  \r  \n
>> 00100a
>>
>> Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
>> became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!
>>
>> In other files created by Perl with output redirection I observed this
>> behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
>> boundary (even multiple times in one output file). Line endings not
>> split by a 4096 byte boundary do not show this behaviour.
>>
>> The behaviour does not occur, when the destination file is located
>> in a directory with binmode mounting. It does not occur either, when
>> I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"),
>> so I think the problem is specific to Cygwin Perl, not to Cygwin in
>> general.
>>
>> I this a bug of the output buffering mechanism of Cygwin Perl?
>> Or do I anything wrong?
>> Any answer is highly appreciated. Thanks in advance.
>
> Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
> The last char of the buffer is not stored when checking the first char
> of the new buffer.
> I think first we have to provide a sample test case to perl core.

I could not reproduce it in perl core with the PerlIO :crlf layer, see
attached test.
I'm investigating cygwin buffer edge-case handling now.

-- 
Reini Urban

[-- Attachment #2: crlf-bufedge.patch --]
[-- Type: application/octet-stream, Size: 997 bytes --]

difforig t/io/crlf.t

diff -u t/io/crlf.t.orig t/io/crlf.t
--- t/io/crlf.t.orig	2011-03-28 21:59:51.729376900 +0200
+++ t/io/crlf.t	2011-06-07 15:34:07.808130000 +0200
@@ -10,10 +10,10 @@
 use Config;
 
 
-my $file = tempfile();
+my $file = "xx"; #tempfile();
 
 {
-    plan(tests => 16);
+    plan(tests => 20);
     ok(open(FOO,">:crlf",$file));
     ok(print FOO 'a'.((('a' x 14).qq{\n}) x 2000) || close(FOO));
     ok(open(FOO,"<:crlf",$file));
@@ -70,6 +70,22 @@
 	    unlike($foo, qr/\x0d\x0d/);
 	}
     }
+
+    # [perl 58xxxx] 4096 bufsize edge-case: \r<bufend>\n not detected
+    # => \r<bufend>\r\n
+    open(FOO,">:crlf",$file);
+    print FOO ('.' x 4095).qq{\n};
+    close(FOO);
+    ok (-s $file == 4097);
+    open(FOO,"<:crlf",$file);
+
+    { local $/; $text = <FOO> }
+    is(count_chars($text, "\015\012"), 0);
+    is(count_chars($text, "\n"), 1);
+    open(FOO, ">:crlf", "$file");
+    print FOO $text;
+    close FOO;
+    ok (-s $file == 4097);
 }
 
 sub count_chars {

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
  2011-06-07 13:40   ` Reini Urban
@ 2011-06-07 15:55     ` Craig A. Berry
  0 siblings, 0 replies; 5+ messages in thread
From: Craig A. Berry @ 2011-06-07 15:55 UTC (permalink / raw)
  To: Reini Urban; +Cc: cygwin, pp

On Tue, Jun 7, 2011 at 8:40 AM, Reini Urban <rurban@x-ray.at> wrote:
> 2011/5/24 Reini Urban:
>> 2011/5/18 Sven Severus:

>>> Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
>>> became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!
>>>
>>> In other files created by Perl with output redirection I observed this
>>> behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
>>> boundary (even multiple times in one output file). Line endings not
>>> split by a 4096 byte boundary do not show this behaviour.

>>> I this a bug of the output buffering mechanism of Cygwin Perl?
>>> Or do I anything wrong?
>>> Any answer is highly appreciated. Thanks in advance.
>>
>> Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
>> The last char of the buffer is not stored when checking the first char
>> of the new buffer.
>> I think first we have to provide a sample test case to perl core.
>
> I could not reproduce it in perl core with the PerlIO :crlf layer, see
> attached test.
> I'm investigating cygwin buffer edge-case handling now.

I don't see Perl versions mentioned here, but note that the default
PerlIO buffer size was 4096 for many years until 5.14.0, at which
point it became the larger of 8192 and BUFSIZ.  So if you're testing
with blead, you may see the behavior at a different offset than 4096,
assuming this involves something getting tangled up between PerlIO
buffer flushing and the Windows-specific crlf layer managing its line
endings.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-07 15:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-18  6:27 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting Sven Severus
2011-05-24  4:20 ` Reini Urban
2011-06-07 13:40   ` Reini Urban
2011-06-07 15:55     ` Craig A. Berry
2011-05-25  6:17 ` Sven Severus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).