public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Why fseek causes a read()?
@ 2020-01-30 23:29 Konstantin Kharlamov
  2020-01-31 16:06 ` Godmar Back
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Kharlamov @ 2020-01-30 23:29 UTC (permalink / raw)
  To: libc-help

I was debugging a hang of hexdump utility with high CPU load, when 
trying to print a piece of block device somewhere around ≈0x80000000 offset.

Long story short: the problem came down to the libc implementation of 
fseek(). When called with SEEK_SET argument, for some reason it does two 
things, in this order:

1. lseek() called with SEEK_SET
2. read() called, which reads everything lseek just skipped (so e.g. for 
my problem with reading at 0x80000000 it would have to read around half 
a terabyte before returning)

I haven't found anything in documentation for `fread()` about having to 
do the read, nor anything came up in search engine. So unless I'm 
missing something, it looks like some odd bug in fseek.

P.S. Example how to reproduce: save the following code as test.c, and 
run it under `strace` utility. You'll see 30 bytes of `test.cpp` being 
read just after lseek call at the end.

     #include <fcntl.h>
     #include <stdio.h>

     int main() {
         FILE* f = fopen("/tmp/test.cpp", "r");
         if (!f)
             perror("");
         fseek(f, 30, SEEK_SET);
     }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why fseek causes a read()?
  2020-01-30 23:29 Why fseek causes a read()? Konstantin Kharlamov
@ 2020-01-31 16:06 ` Godmar Back
  2020-01-31 18:34   ` Konstantin Kharlamov
  0 siblings, 1 reply; 6+ messages in thread
From: Godmar Back @ 2020-01-31 16:06 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: libc-help

This is an interesting question and leads to what semantics is required of
fully buffered streams for files that are essentially subject to random
access.

From a logical point of view, you shouldn't expect to have control over the
read/lseek calls stdio issues since you are using a fully-buffered
abstraction layer (I/O streams).

Perhaps you get better control with unbuffered streams, i.e., if you call
setbuf(f, NULL), does the read() go away?


On Thu, Jan 30, 2020 at 6:29 PM Konstantin Kharlamov <hi-angel@yandex.ru>
wrote:

> I was debugging a hang of hexdump utility with high CPU load, when
> trying to print a piece of block device somewhere around ≈0x80000000
> offset.
>
> Long story short: the problem came down to the libc implementation of
> fseek(). When called with SEEK_SET argument, for some reason it does two
> things, in this order:
>
> 1. lseek() called with SEEK_SET
> 2. read() called, which reads everything lseek just skipped (so e.g. for
> my problem with reading at 0x80000000 it would have to read around half
> a terabyte before returning)
>
> I haven't found anything in documentation for `fread()` about having to
> do the read, nor anything came up in search engine. So unless I'm
> missing something, it looks like some odd bug in fseek.
>
> P.S. Example how to reproduce: save the following code as test.c, and
> run it under `strace` utility. You'll see 30 bytes of `test.cpp` being
> read just after lseek call at the end.
>
>      #include <fcntl.h>
>      #include <stdio.h>
>
>      int main() {
>          FILE* f = fopen("/tmp/test.cpp", "r");
>          if (!f)
>              perror("");
>          fseek(f, 30, SEEK_SET);
>      }
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why fseek causes a read()?
  2020-01-31 16:06 ` Godmar Back
@ 2020-01-31 18:34   ` Konstantin Kharlamov
  2020-01-31 19:16     ` Al
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Kharlamov @ 2020-01-31 18:34 UTC (permalink / raw)
  To: Godmar Back; +Cc: libc-help

On 31.01.2020 19:05, Godmar Back wrote:
> 
> This is an interesting question and leads to what semantics is required 
> of fully buffered streams for files that are essentially subject to 
> random access.
> 
>  From a logical point of view, you shouldn't expect to have control over 
> the read/lseek calls stdio issues since you are using a fully-buffered 
> abstraction layer (I/O streams).
> 
> Perhaps you get better control with unbuffered streams, i.e., if you 
> call setbuf(f, NULL), does the read() go away?
> 

Thank you, indeed it does. I sent a fix to hexdump¹

Anyway, that makes me wonder: is there really any use in this read 
inside fseek()? Can this be just removed in glibc? IMO it's just a waste 
of CPU and IO for all apps that use fseek().

1: https://github.com/karelzak/util-linux/pull/946

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why fseek causes a read()?
  2020-01-31 18:34   ` Konstantin Kharlamov
@ 2020-01-31 19:16     ` Al
  2020-01-31 21:28       ` Konstantin Kharlamov
  0 siblings, 1 reply; 6+ messages in thread
From: Al @ 2020-01-31 19:16 UTC (permalink / raw)
  To: libc-help

It used to be that the documentation for seek/lseek and fseek suggested 
if a device was not capable of direct positioning, that the
library routines would try to chew through the intervening space via a 
(or many) read() system call(s).

I wouldn't want to disable that without understanding the consequences.  
Your experience suggests that that logic may not be functioning 
correctly, but I wouldn't discard it lightly.  It may be better to 
figure out if it is broken, and if so fix it.

I haven't gone through that code in the past, so I am sure if it has 
changed.

Block devices should generally be seekable, but not all are.


On 1/31/2020 10:33, Konstantin Kharlamov wrote:
> On 31.01.2020 19:05, Godmar Back wrote:
>>
>> This is an interesting question and leads to what semantics is 
>> required of fully buffered streams for files that are essentially 
>> subject to random access.
>>
>>  From a logical point of view, you shouldn't expect to have control 
>> over the read/lseek calls stdio issues since you are using a 
>> fully-buffered abstraction layer (I/O streams).
>>
>> Perhaps you get better control with unbuffered streams, i.e., if you 
>> call setbuf(f, NULL), does the read() go away?
>>
>
> Thank you, indeed it does. I sent a fix to hexdump¹
>
> Anyway, that makes me wonder: is there really any use in this read 
> inside fseek()? Can this be just removed in glibc? IMO it's just a 
> waste of CPU and IO for all apps that use fseek().
>
> 1: https://github.com/karelzak/util-linux/pull/946

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why fseek causes a read()?
  2020-01-31 19:16     ` Al
@ 2020-01-31 21:28       ` Konstantin Kharlamov
  2020-02-03 16:13         ` Konstantin Kharlamov
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Kharlamov @ 2020-01-31 21:28 UTC (permalink / raw)
  To: Al, libc-help

On 31.01.2020 22:16, Al wrote:
> It used to be that the documentation for seek/lseek and fseek suggested if a device was not capable of direct positioning, that the
> library routines would try to chew through the intervening space via a (or many) read() system call(s).
> 
> I wouldn't want to disable that without understanding the consequences. Your experience suggests that that logic may not be functioning correctly, but I wouldn't discard it lightly.  It may be better to figure out if it is broken, and if so fix it.
> 
> I haven't gone through that code in the past, so I am sure if it has changed.
> 
> Block devices should generally be seekable, but not all are.

But if the device is not seekable, wouldn't `lseek()` (which is used inside fseek() just before calling the read()) just return a -1? So glibc could check that, and only in case of -1 to fall back to repositioning with read().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why fseek causes a read()?
  2020-01-31 21:28       ` Konstantin Kharlamov
@ 2020-02-03 16:13         ` Konstantin Kharlamov
  0 siblings, 0 replies; 6+ messages in thread
From: Konstantin Kharlamov @ 2020-02-03 16:13 UTC (permalink / raw)
  To: libc-help

Okay, I reported a bug https://sourceware.org/bugzilla/show_bug.cgi?id=25497

On 01.02.2020 00:28, Konstantin Kharlamov wrote:
> On 31.01.2020 22:16, Al wrote:
>> It used to be that the documentation for seek/lseek and fseek suggested if a device was not capable of direct positioning, that the
>> library routines would try to chew through the intervening space via a (or many) read() system call(s).
>>
>> I wouldn't want to disable that without understanding the consequences. Your experience suggests that that logic may not be functioning correctly, but I wouldn't discard it lightly.  It may be better to figure out if it is broken, and if so fix it.
>>
>> I haven't gone through that code in the past, so I am sure if it has changed.
>>
>> Block devices should generally be seekable, but not all are.
> 
> But if the device is not seekable, wouldn't `lseek()` (which is used inside fseek() just before calling the read()) just return a -1? So glibc could check that, and only in case of -1 to fall back to repositioning with read().

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-03 16:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 23:29 Why fseek causes a read()? Konstantin Kharlamov
2020-01-31 16:06 ` Godmar Back
2020-01-31 18:34   ` Konstantin Kharlamov
2020-01-31 19:16     ` Al
2020-01-31 21:28       ` Konstantin Kharlamov
2020-02-03 16:13         ` Konstantin Kharlamov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).