From: "Cristian Rodríguez" <cristian@rodriguez.im>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: libc-alpha@sourceware.org, alexandre.ferrieux@orange.com,
fweimer@redhat.com
Subject: Re: [PATCH v5] Fix #27777 - now use a doubly-linked list for _IO_list_all
Date: Wed, 15 May 2024 11:19:49 -0400 [thread overview]
Message-ID: <CAPBLoAcm5-iQwwKCwJz5i+mkuyA6fUMv7_mEEVE48q0Y3vmg=Q@mail.gmail.com> (raw)
In-Reply-To: <20240513135014.1328169-1-hjl.tools@gmail.com>
On Mon, May 13, 2024 at 9:50 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> From: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
>
> This patch fixes BZ #27777 "fclose does a linear search, takes ages when
> many FILE* are opened". Simply put, the master list of opened (FILE*),
> namely _IO_list_all, is a singly-linked list. As a consequence, the
> removal of a single element is in O(N), which cripples the performance
> of fclose(). The patch switches to a doubly-linked list, yielding O(1)
> removal. The one padding field in struct _IO_FILE, __pad5, is renamed
> to _prevchain for a doubly-linked list. Since fields in struct _IO_FILE
> after the _lock field are internal to glibc and opaque to applications.
> We can change them as long as the size of struct _IO_FILE is unchanged,
> which is checked as the part of glibc ABI with sizes of _IO_2_1_stdin_,
> _IO_2_1_stdout_ and _IO_2_1_stderr_.
>
> NB: When _IO_vtable_offset (fp) == 0, copy relocation will cover the
> whole struct _IO_FILE. Otherwise, only fields up to the _lock field
> will be copied to applications at run-time. It is used to check if
> the _prevchain field can be safely accessed.
>
> After opening 2 million (FILE*), the fclose() of 100 of them takes quite
> a few seconds without the patch, and under 2 seconds with it on a loaded
> machine.
>
> No test is added since there are no functional changes.
>
> Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
> Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> ---
> libio/bits/types/struct_FILE.h | 4 ++--
> libio/genops.c | 26 ++++++++++++++++++++++++++
> libio/stdfiles.c | 15 +++++++++++++++
> 3 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/libio/bits/types/struct_FILE.h b/libio/bits/types/struct_FILE.h
> index 7cdaae86f8..d8d26639d1 100644
> --- a/libio/bits/types/struct_FILE.h
> +++ b/libio/bits/types/struct_FILE.h
> @@ -92,10 +92,10 @@ struct _IO_FILE_complete
> struct _IO_wide_data *_wide_data;
> struct _IO_FILE *_freeres_list;
> void *_freeres_buf;
> - size_t __pad5;
> + struct _IO_FILE **_prevchain;
> int _mode;
> /* Make sure we don't get into trouble again. */
> - char _unused2[15 * sizeof (int) - 4 * sizeof (void *) - sizeof (size_t)];
> + char _unused2[15 * sizeof (int) - 5 * sizeof (void *)];
> };
>
> /* These macros are used by bits/stdio.h and internal headers. */
> diff --git a/libio/genops.c b/libio/genops.c
> index bc45e60a09..994ee9c0b1 100644
> --- a/libio/genops.c
> +++ b/libio/genops.c
> @@ -48,6 +48,19 @@ flush_cleanup (void *not_used)
> }
> #endif
>
> +/* Fields in struct _IO_FILE after the _lock field are internal to
> + glibc and opaque to applications. We can change them as long as
> + the size of struct _IO_FILE is unchanged, which is checked as the
> + part of glibc ABI with sizes of _IO_2_1_stdin_, _IO_2_1_stdout_
> + and _IO_2_1_stderr_.
> +
> + NB: When _IO_vtable_offset (fp) == 0, copy relocation will cover the
> + whole struct _IO_FILE. Otherwise, only fields up to the _lock field
> + will be copied. */
> +_Static_assert (offsetof (struct _IO_FILE, _prevchain)
> + > offsetof (struct _IO_FILE, _lock),
> + "offset of _prevchain > offset of _lock");
> +
> void
> _IO_un_link (struct _IO_FILE_plus *fp)
> {
> @@ -62,6 +75,14 @@ _IO_un_link (struct _IO_FILE_plus *fp)
> #endif
> if (_IO_list_all == NULL)
> ;
> + else if (_IO_vtable_offset ((FILE *) fp) == 0)
> + {
> + FILE **pr = fp->file._prevchain;
> + FILE *nx = fp->file._chain;
> + *pr = nx;
> + if (nx != NULL)
> + nx->_prevchain = pr;
> + }
> else if (fp == _IO_list_all)
> _IO_list_all = (struct _IO_FILE_plus *) _IO_list_all->file._chain;
> else
> @@ -95,6 +116,11 @@ _IO_link_in (struct _IO_FILE_plus *fp)
> _IO_flockfile ((FILE *) fp);
> #endif
> fp->file._chain = (FILE *) _IO_list_all;
> + if (_IO_vtable_offset ((FILE *) fp) == 0)
> + {
> + fp->file._prevchain = (FILE **) &_IO_list_all;
> + _IO_list_all->file._prevchain = &fp->file._chain;
> + }
> _IO_list_all = fp;
> #ifdef _IO_MTSAFE_IO
> _IO_funlockfile ((FILE *) fp);
> diff --git a/libio/stdfiles.c b/libio/stdfiles.c
> index cd8eca8bf3..d607fa02e0 100644
> --- a/libio/stdfiles.c
> +++ b/libio/stdfiles.c
> @@ -54,4 +54,19 @@ DEF_STDFILE(_IO_2_1_stdout_, 1, &_IO_2_1_stdin_, _IO_NO_READS);
> DEF_STDFILE(_IO_2_1_stderr_, 2, &_IO_2_1_stdout_, _IO_NO_READS+_IO_UNBUFFERED);
>
> struct _IO_FILE_plus *_IO_list_all = &_IO_2_1_stderr_;
> +
> +/* Finish the double-linking for stdfiles as static initialization
> + cannot. */
> +
> +__THROW __attribute__ ((constructor))
> +static void
> +_IO_stdfiles_init (void)
> +{
> + struct _IO_FILE **f;
> + for (f = (struct _IO_FILE **) &_IO_list_all;
> + *f != NULL;
> + f = &(*f)->_chain)
> + (*f)->_prevchain = f;
> +}
> +
> libc_hidden_data_def (_IO_list_all)
> --
> 2.45.0
>
Looks as OK as it can be to me given the constraints and need for
backward compatibility.
Ping so it does not get forgotten.
next prev parent reply other threads:[~2024-05-15 15:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-13 13:50 H.J. Lu
2024-05-15 15:19 ` Cristian Rodríguez [this message]
2024-05-15 21:32 ` alexandre.ferrieux
2024-05-16 15:11 ` Carlos O'Donell
2024-05-16 15:20 ` alexandre.ferrieux
2024-05-15 21:56 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPBLoAcm5-iQwwKCwJz5i+mkuyA6fUMv7_mEEVE48q0Y3vmg=Q@mail.gmail.com' \
--to=cristian@rodriguez.im \
--cc=alexandre.ferrieux@orange.com \
--cc=fweimer@redhat.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).