public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* is this a bug in glibc or readpst?
@ 2022-11-23  2:02 Paul Wise
  2022-11-23  8:57 ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Wise @ 2022-11-23  2:02 UTC (permalink / raw)
  To: libc-help


[-- Attachment #1.1: Type: text/plain, Size: 2336 bytes --]

Hi folks,

I have a strange bug that might be an issue in glibc or it might be a
bug in how readpst works and or uses freopen in multi-process mode.

This is a summary of how the debugging process went up until now:

readpst from Debian buster in multi-process mode works but readpst from
Debian bullseye randomly loses some data. Current readpst works on
Debian buster but not Debian bullseye. The problem isn't related to the
GCC optimisation level. The problem isn't compiler related, clang
exhibits the problem too. Upgrading libc6 from 2.28-10 to 2.29-1 caused
the issue. Bisecting glibc pointed at commit 0b727ed4d, which is titled
"libio: Flush stream at freopen (BZ#21037)" and looks legitimate as it
aligns glibc freopen with POSIX specifications. readpst is using
freopen() after fork() to get new *.pst FILE pointers for child
processes. Both the parent and child FILE are opened read-only. The
FILE position is 0 after freopen for both scenarios. readpst seems to
be skipping some PST file blocks in the broken scenario. The debug logs
seem to indicate that in the broken scenario it reads data from a wrong
location, even though the file position is 0 after freopen. Switching
the readpst code to use fclose()+fopen() after fork() instead of
freopen() after fork() fixes the issue.

Here are the glibc freopen commit and the initial libpst bug report.

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0b727ed4d605d9318cb0d323c88abb0d5a441a9b
https://github.com/pst-format/libpst/issues/7

The readpst forking and freopen functions are here:

https://github.com/pst-format/libpst/blob/main/src/readpst.c#L203
https://github.com/pst-format/libpst/blob/main/src/libpst.c#L395

I have been debugging this using the attached scripts on a Debian
system. The outside script sets up a chroot using Debian schroot and
then runs the inside script to do the test inside the chroot. If you
want to run the inside script you may want to customise the paths it
uses so the script doesn't put files in your home directory.

I am hoping someone can help give me an idea if this is likely to be a
problem in glibc or readpst/libpst and or what other debugging strategy
might be useful to figure that out and pinpoint where the problem is.

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #1.2: outside --]
[-- Type: application/x-shellscript, Size: 698 bytes --]

[-- Attachment #1.3: inside --]
[-- Type: application/x-shellscript, Size: 1244 bytes --]

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2022-11-23  2:02 is this a bug in glibc or readpst? Paul Wise
@ 2022-11-23  8:57 ` Florian Weimer
  2022-11-24  0:06   ` Paul Wise
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-11-23  8:57 UTC (permalink / raw)
  To: Paul Wise via Libc-help; +Cc: Paul Wise

* Paul Wise via Libc-help:

> readpst from Debian buster in multi-process mode works but readpst from
> Debian bullseye randomly loses some data. Current readpst works on
> Debian buster but not Debian bullseye. The problem isn't related to the
> GCC optimisation level. The problem isn't compiler related, clang
> exhibits the problem too. Upgrading libc6 from 2.28-10 to 2.29-1 caused
> the issue. Bisecting glibc pointed at commit 0b727ed4d, which is titled
> "libio: Flush stream at freopen (BZ#21037)" and looks legitimate as it
> aligns glibc freopen with POSIX specifications. readpst is using
> freopen() after fork() to get new *.pst FILE pointers for child
> processes. Both the parent and child FILE are opened read-only. The
> FILE position is 0 after freopen for both scenarios. readpst seems to
> be skipping some PST file blocks in the broken scenario. The debug logs
> seem to indicate that in the broken scenario it reads data from a wrong
> location, even though the file position is 0 after freopen. Switching
> the readpst code to use fclose()+fopen() after fork() instead of
> freopen() after fork() fixes the issue.

Fork still shares the underlying file description.  It only duplicates
the descriptors.  If the subprocess changes the file pointer back to 0,
it will affect the original process, too.  This is just how file
descriptors work.

Could this explain the issue?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2022-11-23  8:57 ` Florian Weimer
@ 2022-11-24  0:06   ` Paul Wise
  2022-12-02 17:37     ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Wise @ 2022-11-24  0:06 UTC (permalink / raw)
  To: Florian Weimer, Libc-help

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Wed, 2022-11-23 at 09:57 +0100, Florian Weimer via Libc-help wrote:

> Fork still shares the underlying file description.  It only duplicates
> the descriptors.  If the subprocess changes the file pointer back to 0,
> it will affect the original process, too.  This is just how file
> descriptors work.
> 
> Could this explain the issue?

That does sound like what the symptoms would indicate.

However it doesn't seem to fit, because the first thing the child
process does is call freopen(), which should in theory close the
descriptor and then create a new file descriptor and file description?

On the other hand, the fact that using fclose()+fopen() instead of
freopen() does seem to fix the problem, seems to indicate that
freopen() is not actually successfully closing the descriptor and
thus the file pointer is shared between processes?

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2022-11-24  0:06   ` Paul Wise
@ 2022-12-02 17:37     ` Florian Weimer
  2023-08-06  6:07       ` Paul Wise
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-12-02 17:37 UTC (permalink / raw)
  To: Paul Wise; +Cc: Libc-help

* Paul Wise:

> On Wed, 2022-11-23 at 09:57 +0100, Florian Weimer via Libc-help wrote:
>
>> Fork still shares the underlying file description.  It only duplicates
>> the descriptors.  If the subprocess changes the file pointer back to 0,
>> it will affect the original process, too.  This is just how file
>> descriptors work.
>> 
>> Could this explain the issue?
>
> That does sound like what the symptoms would indicate.
>
> However it doesn't seem to fit, because the first thing the child
> process does is call freopen(), which should in theory close the
> descriptor and then create a new file descriptor and file description?
>
> On the other hand, the fact that using fclose()+fopen() instead of
> freopen() does seem to fix the problem, seems to indicate that
> freopen() is not actually successfully closing the descriptor and
> thus the file pointer is shared between processes?

Could you run it under strace -k, with debuginfo?  Hopefully that will
shed some light on the problem.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2022-12-02 17:37     ` Florian Weimer
@ 2023-08-06  6:07       ` Paul Wise
  2023-08-07  8:59         ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Wise @ 2023-08-06  6:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Libc-help

[-- Attachment #1: Type: text/plain, Size: 2543 bytes --]

On Fri, 2022-12-02 at 18:37 +0100, Florian Weimer wrote:

> Could you run it under strace -k, with debuginfo?  Hopefully that will
> shed some light on the problem.

Finally got back to working on this.

With glibc 2.39+ freopen these calls happen, bug present:

   lseek(3, -2138, SEEK_CUR)                 = 268198
   openat(AT_FDCWD, "example.pst", O_RDONLY) = 4
   dup3(4, 3, 0)                             = 3
   close(4)                                  = 0

With glibc 2.38 freopen these calls happen, bug absent:

   openat(AT_FDCWD, "example.pst", O_RDONLY) = 4
   dup3(4, 3, 0)                             = 3
   close(4)                                  = 0

With fclose+fopen these calls happen, bug absent:

   close(3)                                = 0
   openat(AT_FDCWD, "example.pst", O_RDONLY) = 3
   
With fflush+fclose+fopen these calls happen, bug present:

   lseek(3, -2138, SEEK_CUR)               = 268198
   close(3)                                = 0
   openat(AT_FDCWD, "example.pst", O_RDONLY) = 3

So the problem is that glibc freopen calls lseek on the existing file
descriptor, modifying the shared file description. So it seems like the
child process is interfering with the parent process file position.

Looking at the stack traces, the _IO_SYNC call introduced in the commit
0b727ed4d that I identified as triggering the readpst bug is definitely
the source of the lseek call.

   lseek(3, -2138, SEEK_CUR)               = 268198
    > /lib/x86_64-linux-gnu/libc.so.6(lseek64+0x7) [0xf81b7]
    > /lib/x86_64-linux-gnu/libc.so.6(_IO_file_sync+0x7e) [0x800ae]
    > /lib/x86_64-linux-gnu/libc.so.6(freopen+0x8c) [0x7d95c]
    > /usr/bin/readpst(pst_reopen+0x36) [0x102c6]
    > /usr/bin/readpst(try_fork+0x88) [0x4398]
    > /usr/bin/readpst(process+0x121) [0xa451]
    > /usr/bin/readpst(main+0x555) [0x3cc5]
    > /lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x271ca]
    > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x27285]
    > /usr/bin/readpst(_start+0x21) [0x4061]

Based on the same thing happening with fflush+fclose+fopen, I'm not
entirely sure, but I think that this represents a POSIX compliance bug
in readpst that was papered over by the earlier behaviour of glibc.

Is my reasoning correct here?
   
For my notes:

   strace -o strace/readpst -e trace='!read,write' --follow-forks --output-separately --stack-traces readpst -j 2 -D -M -b example.pst
   
-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2023-08-06  6:07       ` Paul Wise
@ 2023-08-07  8:59         ` Florian Weimer
  2023-08-07 11:00           ` Paul Wise
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2023-08-07  8:59 UTC (permalink / raw)
  To: Paul Wise via Libc-help; +Cc: Paul Wise

* Paul Wise via Libc-help:

> Looking at the stack traces, the _IO_SYNC call introduced in the commit
> 0b727ed4d that I identified as triggering the readpst bug is definitely
> the source of the lseek call.
>
>    lseek(3, -2138, SEEK_CUR)               = 268198
>     > /lib/x86_64-linux-gnu/libc.so.6(lseek64+0x7) [0xf81b7]
>     > /lib/x86_64-linux-gnu/libc.so.6(_IO_file_sync+0x7e) [0x800ae]
>     > /lib/x86_64-linux-gnu/libc.so.6(freopen+0x8c) [0x7d95c]
>     > /usr/bin/readpst(pst_reopen+0x36) [0x102c6]
>     > /usr/bin/readpst(try_fork+0x88) [0x4398]
>     > /usr/bin/readpst(process+0x121) [0xa451]
>     > /usr/bin/readpst(main+0x555) [0x3cc5]
>     > /lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x271ca]
>     > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x27285]
>     > /usr/bin/readpst(_start+0x21) [0x4061]
>
> Based on the same thing happening with fflush+fclose+fopen, I'm not
> entirely sure, but I think that this represents a POSIX compliance bug
> in readpst that was papered over by the earlier behaviour of glibc.

We could perhaps avoid the seek for read-only streams.  We are about to
close the stream anyway, so fflush does not have any useful effect
anyway.  Not sure what others think if it is worth the complexity.  On
the other hand, preserving backwards compatibility here does not appear
to be too onerous.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2023-08-07  8:59         ` Florian Weimer
@ 2023-08-07 11:00           ` Paul Wise
  2023-08-07 11:46             ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Wise @ 2023-08-07 11:00 UTC (permalink / raw)
  To: Florian Weimer, Libc-help

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

On Mon, 2023-08-07 at 10:59 +0200, Florian Weimer wrote:

> We could perhaps avoid the seek for read-only streams.  We are about to
> close the stream anyway, so fflush does not have any useful effect
> anyway.  Not sure what others think if it is worth the complexity.  On
> the other hand, preserving backwards compatibility here does not appear
> to be too onerous.

It might be worth checking what other implementations do, perhaps
if they all avoid the seek in this case then maybe glibc should too?

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2023-08-07 11:00           ` Paul Wise
@ 2023-08-07 11:46             ` Florian Weimer
  2023-08-07 11:58               ` Paul Wise
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2023-08-07 11:46 UTC (permalink / raw)
  To: Paul Wise; +Cc: Libc-help

* Paul Wise:

> On Mon, 2023-08-07 at 10:59 +0200, Florian Weimer wrote:
>
>> We could perhaps avoid the seek for read-only streams.  We are about to
>> close the stream anyway, so fflush does not have any useful effect
>> anyway.  Not sure what others think if it is worth the complexity.  On
>> the other hand, preserving backwards compatibility here does not appear
>> to be too onerous.
>
> It might be worth checking what other implementations do, perhaps
> if they all avoid the seek in this case then maybe glibc should too?

I expect this is difficult to figure out for certain because of the
internal stream state may not be very clear unless you are familiar with
the code. 8-(

So I'd prefer to fix the glibc regression instead.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is this a bug in glibc or readpst?
  2023-08-07 11:46             ` Florian Weimer
@ 2023-08-07 11:58               ` Paul Wise
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Wise @ 2023-08-07 11:58 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Libc-help

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On Mon, 2023-08-07 at 13:46 +0200, Florian Weimer wrote:

> I expect this is difficult to figure out for certain because of the
> internal stream state may not be very clear unless you are familiar with
> the code. 8-(

I just got confirmation that the same issue is present on Alpine Linux,
which uses the musl libc IIRC and it confirmed with musl-gcc on Debian.

https://paste.debian.net/plainh/47ef35e3/
https://paste.debian.net/1288127/

> So I'd prefer to fix the glibc regression instead.

OK. Given the likely widely varying behaviour in libc implementations,
probably I should also fix the bug in libpst and readpst too.

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-08-07 12:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-23  2:02 is this a bug in glibc or readpst? Paul Wise
2022-11-23  8:57 ` Florian Weimer
2022-11-24  0:06   ` Paul Wise
2022-12-02 17:37     ` Florian Weimer
2023-08-06  6:07       ` Paul Wise
2023-08-07  8:59         ` Florian Weimer
2023-08-07 11:00           ` Paul Wise
2023-08-07 11:46             ` Florian Weimer
2023-08-07 11:58               ` Paul Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).