public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/5742] New: stdio poor file buffering in "w+b" mode
@ 2008-02-06 16:29 olivier dot paquet at gmail dot com
  2008-02-06 16:31 ` [Bug libc/5742] " olivier dot paquet at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: olivier dot paquet at gmail dot com @ 2008-02-06 16:29 UTC (permalink / raw)
  To: glibc-bugs

Opening a file with mode "w+b" and then doing repetetive fseek() and fwrite()
calls will result in pointless read() calls to the kernel. The output of
'strace' on the process looks like this:

_llseek(3, 2424832, [2424832], SEEK_SET) = 0
read(3, "\0`\374\267\1\0\0\0E\3\0"..., 32768) = 14848
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2560
_llseek(3, 2424832, [2424832], SEEK_SET) = 0
read(3, "\0`\374\267\1\0\0\0E\3\0\0"..., 32768) = 17408
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2560

yet there are no fread() calls of any kind in the program. The reads are not
only useless, they trigger a catastrophic slowdown when this is done over nfs
with recent linux kernels, as reported in:
http://bugzilla.kernel.org/show_bug.cgi?id=9566

I also checked glibc 2.7 and 2.7-20080204 and both have the exact same behavior.
Note that the above strace is for a file on nfs; on a local filesystem glibc
appears to use a smaller buffer but the read() calls are still there.

-- 
           Summary: stdio poor file buffering in "w+b" mode
           Product: glibc
           Version: 2.3.6
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: olivier dot paquet at gmail dot com
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: i486-slackware-linux
  GCC host triplet: i486-slackware-linux
GCC target triplet: i486-slackware-linux


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
@ 2008-02-06 16:31 ` olivier dot paquet at gmail dot com
  2008-02-21  3:56 ` carlos at codesourcery dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: olivier dot paquet at gmail dot com @ 2008-02-06 16:31 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From olivier dot paquet at gmail dot com  2008-02-06 16:30 -------
Created an attachment (id=2232)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=2232&action=view)
sample program to show described behavior


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
  2008-02-06 16:31 ` [Bug libc/5742] " olivier dot paquet at gmail dot com
@ 2008-02-21  3:56 ` carlos at codesourcery dot com
  2008-02-21 14:24 ` olivier dot paquet at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: carlos at codesourcery dot com @ 2008-02-21  3:56 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From carlos at codesourcery dot com  2008-02-21 03:56 -------
Have you tried using setvbuf to set the stream to unbuffered? Does this make a
difference?

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
  2008-02-06 16:31 ` [Bug libc/5742] " olivier dot paquet at gmail dot com
  2008-02-21  3:56 ` carlos at codesourcery dot com
@ 2008-02-21 14:24 ` olivier dot paquet at gmail dot com
  2008-02-21 15:59 ` carlos at codesourcery dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: olivier dot paquet at gmail dot com @ 2008-02-21 14:24 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From olivier dot paquet at gmail dot com  2008-02-21 14:23 -------
(In reply to comment #2)
> Have you tried using setvbuf to set the stream to unbuffered? Does this make a
> difference?

Yes, it does completely disable buffering and the calls seem to be simply passed
directly to the kernel. So it avoids the dummy reads but it forces me to do my
own buffering where it's needed (because according to the man page setvbuf can
only be used on a newly opened stream).

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
                   ` (2 preceding siblings ...)
  2008-02-21 14:24 ` olivier dot paquet at gmail dot com
@ 2008-02-21 15:59 ` carlos at codesourcery dot com
  2008-02-21 16:22 ` olivier dot paquet at gmail dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: carlos at codesourcery dot com @ 2008-02-21 15:59 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From carlos at codesourcery dot com  2008-02-21 15:59 -------
Is it acceptable to open the file with "wb" instead of "w+b"? This would avoid
the spurious read.

Requesting that a file be opened for update "+" forces the implementation to
expect that a read *or* write could occur (as per the standard). The
implementation chooses to fill the stream buffer after the seek.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
                   ` (3 preceding siblings ...)
  2008-02-21 15:59 ` carlos at codesourcery dot com
@ 2008-02-21 16:22 ` olivier dot paquet at gmail dot com
  2008-03-14 17:10 ` rsa at us dot ibm dot com
  2010-04-16  7:54 ` wucknitz at astro dot uni-bonn dot de
  6 siblings, 0 replies; 8+ messages in thread
From: olivier dot paquet at gmail dot com @ 2008-02-21 16:22 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From olivier dot paquet at gmail dot com  2008-02-21 16:22 -------
(In reply to comment #4)
> Is it acceptable to open the file with "wb" instead of "w+b"? This would avoid
> the spurious read.

Not in this case unfortunately. The file is written to most of the time but
there are occasional reads.

> Requesting that a file be opened for update "+" forces the implementation to
> expect that a read *or* write could occur (as per the standard). The
> implementation chooses to fill the stream buffer after the seek.

Yes, I am quite aware of that. It's that algorithm (or choice if you want to
call it that) which is a little simplistic. The implementation could very well
delay filling its buffer until the first read or non-contiguous write. This
would only require a pointer to remember that "the buffer is valid up to here".
A write can push this pointer forward without the need for reading data from the
file. A read obviously needs to get some real data. A small seek which does not
cause the buffer to move (if such a thing is possible, I wasn't able to find the
buffering logic in the libc code) also needs to read some data to avoid
uninitialized holes in the buffer which would prevent writing it back to the
file in a single chunk.

You could push the idea to the extreme and have a single 'validity' bit for each
byte in the buffer and act optimally according to those. That would be overkill
as the idea above would handle most interesting cases with far less overhead and
risk of bugs.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
                   ` (4 preceding siblings ...)
  2008-02-21 16:22 ` olivier dot paquet at gmail dot com
@ 2008-03-14 17:10 ` rsa at us dot ibm dot com
  2010-04-16  7:54 ` wucknitz at astro dot uni-bonn dot de
  6 siblings, 0 replies; 8+ messages in thread
From: rsa at us dot ibm dot com @ 2008-03-14 17:10 UTC (permalink / raw)
  To: glibc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libc/5742] stdio poor file buffering in "w+b" mode
  2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
                   ` (5 preceding siblings ...)
  2008-03-14 17:10 ` rsa at us dot ibm dot com
@ 2010-04-16  7:54 ` wucknitz at astro dot uni-bonn dot de
  6 siblings, 0 replies; 8+ messages in thread
From: wucknitz at astro dot uni-bonn dot de @ 2010-04-16  7:54 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From wucknitz at astro dot uni-bonn dot de  2010-04-16 07:54 -------
This problem still seems to be around. I wasn't aware of it until I tried a
program that fseeks/freads its way through a big file on a RAID system with
large st_blksize. While it is fast on a normal disk with st_blksize=4096, it is
incredibly slow on the RAID with ~5.5MB, even though the RAID is otherwise very
fast. In this particular case I have access to the source code and could set the
io to unbuffered (with setvbuf). That brought the execution time down from
>80min to 14sec, i.e. a factor of 350 in a real life case. There are probably
other applications around in which the behaviour that fseek does a read ahead
can cause huge performance problems. It would be really nice if it could be fixed.

As a test I tried to skip through blocks of a large file with many fseeks
without actually reading anything. Trying that on different disks and
filesystems, I saw a very clear correlation between st_blksize and execution
time. Particularly on high performance RAID systems you do not really want to
spend most of the time copying huge amounts of data from one buffer to the other.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5742

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-04-16  7:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-06 16:29 [Bug libc/5742] New: stdio poor file buffering in "w+b" mode olivier dot paquet at gmail dot com
2008-02-06 16:31 ` [Bug libc/5742] " olivier dot paquet at gmail dot com
2008-02-21  3:56 ` carlos at codesourcery dot com
2008-02-21 14:24 ` olivier dot paquet at gmail dot com
2008-02-21 15:59 ` carlos at codesourcery dot com
2008-02-21 16:22 ` olivier dot paquet at gmail dot com
2008-03-14 17:10 ` rsa at us dot ibm dot com
2010-04-16  7:54 ` wucknitz at astro dot uni-bonn dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).