* is this a bug in glibc or readpst? @ 2022-11-23 2:02 Paul Wise 2022-11-23 8:57 ` Florian Weimer 0 siblings, 1 reply; 9+ messages in thread From: Paul Wise @ 2022-11-23 2:02 UTC (permalink / raw) To: libc-help [-- Attachment #1.1: Type: text/plain, Size: 2336 bytes --] Hi folks, I have a strange bug that might be an issue in glibc or it might be a bug in how readpst works and or uses freopen in multi-process mode. This is a summary of how the debugging process went up until now: readpst from Debian buster in multi-process mode works but readpst from Debian bullseye randomly loses some data. Current readpst works on Debian buster but not Debian bullseye. The problem isn't related to the GCC optimisation level. The problem isn't compiler related, clang exhibits the problem too. Upgrading libc6 from 2.28-10 to 2.29-1 caused the issue. Bisecting glibc pointed at commit 0b727ed4d, which is titled "libio: Flush stream at freopen (BZ#21037)" and looks legitimate as it aligns glibc freopen with POSIX specifications. readpst is using freopen() after fork() to get new *.pst FILE pointers for child processes. Both the parent and child FILE are opened read-only. The FILE position is 0 after freopen for both scenarios. readpst seems to be skipping some PST file blocks in the broken scenario. The debug logs seem to indicate that in the broken scenario it reads data from a wrong location, even though the file position is 0 after freopen. Switching the readpst code to use fclose()+fopen() after fork() instead of freopen() after fork() fixes the issue. Here are the glibc freopen commit and the initial libpst bug report. https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0b727ed4d605d9318cb0d323c88abb0d5a441a9b https://github.com/pst-format/libpst/issues/7 The readpst forking and freopen functions are here: https://github.com/pst-format/libpst/blob/main/src/readpst.c#L203 https://github.com/pst-format/libpst/blob/main/src/libpst.c#L395 I have been debugging this using the attached scripts on a Debian system. The outside script sets up a chroot using Debian schroot and then runs the inside script to do the test inside the chroot. If you want to run the inside script you may want to customise the paths it uses so the script doesn't put files in your home directory. I am hoping someone can help give me an idea if this is likely to be a problem in glibc or readpst/libpst and or what other debugging strategy might be useful to figure that out and pinpoint where the problem is. -- bye, pabs https://bonedaddy.net/pabs3/ [-- Attachment #1.2: outside --] [-- Type: application/x-shellscript, Size: 698 bytes --] [-- Attachment #1.3: inside --] [-- Type: application/x-shellscript, Size: 1244 bytes --] [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2022-11-23 2:02 is this a bug in glibc or readpst? Paul Wise @ 2022-11-23 8:57 ` Florian Weimer 2022-11-24 0:06 ` Paul Wise 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2022-11-23 8:57 UTC (permalink / raw) To: Paul Wise via Libc-help; +Cc: Paul Wise * Paul Wise via Libc-help: > readpst from Debian buster in multi-process mode works but readpst from > Debian bullseye randomly loses some data. Current readpst works on > Debian buster but not Debian bullseye. The problem isn't related to the > GCC optimisation level. The problem isn't compiler related, clang > exhibits the problem too. Upgrading libc6 from 2.28-10 to 2.29-1 caused > the issue. Bisecting glibc pointed at commit 0b727ed4d, which is titled > "libio: Flush stream at freopen (BZ#21037)" and looks legitimate as it > aligns glibc freopen with POSIX specifications. readpst is using > freopen() after fork() to get new *.pst FILE pointers for child > processes. Both the parent and child FILE are opened read-only. The > FILE position is 0 after freopen for both scenarios. readpst seems to > be skipping some PST file blocks in the broken scenario. The debug logs > seem to indicate that in the broken scenario it reads data from a wrong > location, even though the file position is 0 after freopen. Switching > the readpst code to use fclose()+fopen() after fork() instead of > freopen() after fork() fixes the issue. Fork still shares the underlying file description. It only duplicates the descriptors. If the subprocess changes the file pointer back to 0, it will affect the original process, too. This is just how file descriptors work. Could this explain the issue? Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2022-11-23 8:57 ` Florian Weimer @ 2022-11-24 0:06 ` Paul Wise 2022-12-02 17:37 ` Florian Weimer 0 siblings, 1 reply; 9+ messages in thread From: Paul Wise @ 2022-11-24 0:06 UTC (permalink / raw) To: Florian Weimer, Libc-help [-- Attachment #1: Type: text/plain, Size: 921 bytes --] On Wed, 2022-11-23 at 09:57 +0100, Florian Weimer via Libc-help wrote: > Fork still shares the underlying file description. It only duplicates > the descriptors. If the subprocess changes the file pointer back to 0, > it will affect the original process, too. This is just how file > descriptors work. > > Could this explain the issue? That does sound like what the symptoms would indicate. However it doesn't seem to fit, because the first thing the child process does is call freopen(), which should in theory close the descriptor and then create a new file descriptor and file description? On the other hand, the fact that using fclose()+fopen() instead of freopen() does seem to fix the problem, seems to indicate that freopen() is not actually successfully closing the descriptor and thus the file pointer is shared between processes? -- bye, pabs https://bonedaddy.net/pabs3/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2022-11-24 0:06 ` Paul Wise @ 2022-12-02 17:37 ` Florian Weimer 2023-08-06 6:07 ` Paul Wise 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2022-12-02 17:37 UTC (permalink / raw) To: Paul Wise; +Cc: Libc-help * Paul Wise: > On Wed, 2022-11-23 at 09:57 +0100, Florian Weimer via Libc-help wrote: > >> Fork still shares the underlying file description. It only duplicates >> the descriptors. If the subprocess changes the file pointer back to 0, >> it will affect the original process, too. This is just how file >> descriptors work. >> >> Could this explain the issue? > > That does sound like what the symptoms would indicate. > > However it doesn't seem to fit, because the first thing the child > process does is call freopen(), which should in theory close the > descriptor and then create a new file descriptor and file description? > > On the other hand, the fact that using fclose()+fopen() instead of > freopen() does seem to fix the problem, seems to indicate that > freopen() is not actually successfully closing the descriptor and > thus the file pointer is shared between processes? Could you run it under strace -k, with debuginfo? Hopefully that will shed some light on the problem. Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2022-12-02 17:37 ` Florian Weimer @ 2023-08-06 6:07 ` Paul Wise 2023-08-07 8:59 ` Florian Weimer 0 siblings, 1 reply; 9+ messages in thread From: Paul Wise @ 2023-08-06 6:07 UTC (permalink / raw) To: Florian Weimer; +Cc: Libc-help [-- Attachment #1: Type: text/plain, Size: 2543 bytes --] On Fri, 2022-12-02 at 18:37 +0100, Florian Weimer wrote: > Could you run it under strace -k, with debuginfo? Hopefully that will > shed some light on the problem. Finally got back to working on this. With glibc 2.39+ freopen these calls happen, bug present: lseek(3, -2138, SEEK_CUR) = 268198 openat(AT_FDCWD, "example.pst", O_RDONLY) = 4 dup3(4, 3, 0) = 3 close(4) = 0 With glibc 2.38 freopen these calls happen, bug absent: openat(AT_FDCWD, "example.pst", O_RDONLY) = 4 dup3(4, 3, 0) = 3 close(4) = 0 With fclose+fopen these calls happen, bug absent: close(3) = 0 openat(AT_FDCWD, "example.pst", O_RDONLY) = 3 With fflush+fclose+fopen these calls happen, bug present: lseek(3, -2138, SEEK_CUR) = 268198 close(3) = 0 openat(AT_FDCWD, "example.pst", O_RDONLY) = 3 So the problem is that glibc freopen calls lseek on the existing file descriptor, modifying the shared file description. So it seems like the child process is interfering with the parent process file position. Looking at the stack traces, the _IO_SYNC call introduced in the commit 0b727ed4d that I identified as triggering the readpst bug is definitely the source of the lseek call. lseek(3, -2138, SEEK_CUR) = 268198 > /lib/x86_64-linux-gnu/libc.so.6(lseek64+0x7) [0xf81b7] > /lib/x86_64-linux-gnu/libc.so.6(_IO_file_sync+0x7e) [0x800ae] > /lib/x86_64-linux-gnu/libc.so.6(freopen+0x8c) [0x7d95c] > /usr/bin/readpst(pst_reopen+0x36) [0x102c6] > /usr/bin/readpst(try_fork+0x88) [0x4398] > /usr/bin/readpst(process+0x121) [0xa451] > /usr/bin/readpst(main+0x555) [0x3cc5] > /lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x271ca] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x27285] > /usr/bin/readpst(_start+0x21) [0x4061] Based on the same thing happening with fflush+fclose+fopen, I'm not entirely sure, but I think that this represents a POSIX compliance bug in readpst that was papered over by the earlier behaviour of glibc. Is my reasoning correct here? For my notes: strace -o strace/readpst -e trace='!read,write' --follow-forks --output-separately --stack-traces readpst -j 2 -D -M -b example.pst -- bye, pabs https://bonedaddy.net/pabs3/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2023-08-06 6:07 ` Paul Wise @ 2023-08-07 8:59 ` Florian Weimer 2023-08-07 11:00 ` Paul Wise 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2023-08-07 8:59 UTC (permalink / raw) To: Paul Wise via Libc-help; +Cc: Paul Wise * Paul Wise via Libc-help: > Looking at the stack traces, the _IO_SYNC call introduced in the commit > 0b727ed4d that I identified as triggering the readpst bug is definitely > the source of the lseek call. > > lseek(3, -2138, SEEK_CUR) = 268198 > > /lib/x86_64-linux-gnu/libc.so.6(lseek64+0x7) [0xf81b7] > > /lib/x86_64-linux-gnu/libc.so.6(_IO_file_sync+0x7e) [0x800ae] > > /lib/x86_64-linux-gnu/libc.so.6(freopen+0x8c) [0x7d95c] > > /usr/bin/readpst(pst_reopen+0x36) [0x102c6] > > /usr/bin/readpst(try_fork+0x88) [0x4398] > > /usr/bin/readpst(process+0x121) [0xa451] > > /usr/bin/readpst(main+0x555) [0x3cc5] > > /lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x271ca] > > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x27285] > > /usr/bin/readpst(_start+0x21) [0x4061] > > Based on the same thing happening with fflush+fclose+fopen, I'm not > entirely sure, but I think that this represents a POSIX compliance bug > in readpst that was papered over by the earlier behaviour of glibc. We could perhaps avoid the seek for read-only streams. We are about to close the stream anyway, so fflush does not have any useful effect anyway. Not sure what others think if it is worth the complexity. On the other hand, preserving backwards compatibility here does not appear to be too onerous. Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2023-08-07 8:59 ` Florian Weimer @ 2023-08-07 11:00 ` Paul Wise 2023-08-07 11:46 ` Florian Weimer 0 siblings, 1 reply; 9+ messages in thread From: Paul Wise @ 2023-08-07 11:00 UTC (permalink / raw) To: Florian Weimer, Libc-help [-- Attachment #1: Type: text/plain, Size: 569 bytes --] On Mon, 2023-08-07 at 10:59 +0200, Florian Weimer wrote: > We could perhaps avoid the seek for read-only streams. We are about to > close the stream anyway, so fflush does not have any useful effect > anyway. Not sure what others think if it is worth the complexity. On > the other hand, preserving backwards compatibility here does not appear > to be too onerous. It might be worth checking what other implementations do, perhaps if they all avoid the seek in this case then maybe glibc should too? -- bye, pabs https://bonedaddy.net/pabs3/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2023-08-07 11:00 ` Paul Wise @ 2023-08-07 11:46 ` Florian Weimer 2023-08-07 11:58 ` Paul Wise 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2023-08-07 11:46 UTC (permalink / raw) To: Paul Wise; +Cc: Libc-help * Paul Wise: > On Mon, 2023-08-07 at 10:59 +0200, Florian Weimer wrote: > >> We could perhaps avoid the seek for read-only streams. We are about to >> close the stream anyway, so fflush does not have any useful effect >> anyway. Not sure what others think if it is worth the complexity. On >> the other hand, preserving backwards compatibility here does not appear >> to be too onerous. > > It might be worth checking what other implementations do, perhaps > if they all avoid the seek in this case then maybe glibc should too? I expect this is difficult to figure out for certain because of the internal stream state may not be very clear unless you are familiar with the code. 8-( So I'd prefer to fix the glibc regression instead. Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: is this a bug in glibc or readpst? 2023-08-07 11:46 ` Florian Weimer @ 2023-08-07 11:58 ` Paul Wise 0 siblings, 0 replies; 9+ messages in thread From: Paul Wise @ 2023-08-07 11:58 UTC (permalink / raw) To: Florian Weimer; +Cc: Libc-help [-- Attachment #1: Type: text/plain, Size: 696 bytes --] On Mon, 2023-08-07 at 13:46 +0200, Florian Weimer wrote: > I expect this is difficult to figure out for certain because of the > internal stream state may not be very clear unless you are familiar with > the code. 8-( I just got confirmation that the same issue is present on Alpine Linux, which uses the musl libc IIRC and it confirmed with musl-gcc on Debian. https://paste.debian.net/plainh/47ef35e3/ https://paste.debian.net/1288127/ > So I'd prefer to fix the glibc regression instead. OK. Given the likely widely varying behaviour in libc implementations, probably I should also fix the bug in libpst and readpst too. -- bye, pabs https://bonedaddy.net/pabs3/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-07 12:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-23 2:02 is this a bug in glibc or readpst? Paul Wise 2022-11-23 8:57 ` Florian Weimer 2022-11-24 0:06 ` Paul Wise 2022-12-02 17:37 ` Florian Weimer 2023-08-06 6:07 ` Paul Wise 2023-08-07 8:59 ` Florian Weimer 2023-08-07 11:00 ` Paul Wise 2023-08-07 11:46 ` Florian Weimer 2023-08-07 11:58 ` Paul Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).