From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org, Paul Eggert <eggert@cs.ucla.edu>,
Florian Weimer <fweimer@redhat.com>
Subject: Re: [PATCH v8 6/7] stdlib: Remove use of mergesort on qsort (BZ 21719)
Date: Mon, 30 Oct 2023 14:04:24 -0500 [thread overview]
Message-ID: <CAFUsyfL4kM9DnK1vFzJhr46YZud6aiXTskv6mT6afZZ=fTqTfQ@mail.gmail.com> (raw)
In-Reply-To: <20231003122251.3325435-7-adhemerval.zanella@linaro.org>
On Tue, Oct 3, 2023 at 7:23 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> This patch removes the mergesort optimization on qsort implementation
> and uses the introsort instead. The mergesort implementation has some
> issues:
>
> - It is as-safe only for certain types sizes (if total size is less
> than 1 KB with large element sizes also forcing memory allocation)
> which contradicts the function documentation. Although not required
> by the C standard, it is preferable and doable to have an O(1) space
> implementation.
>
> - The malloc for certain element size and element number adds
> arbitrary latency (might even be worse if malloc is interposed).
>
> - To avoid trigger swap from memory allocation the implementation
> relies on system information that might be virtualized (for instance
> VMs with overcommit memory) which might lead to potentially use of
> swap even if system advertise more memory than actually has. The
> check also have the downside of issuing syscalls where none is
> expected (although only once per execution).
>
> - The mergesort is suboptimal on an already sorted array (BZ#21719).
>
> The introsort implementation is already optimized to use constant extra
> space (due to the limit of total number of elements from maximum VM
> size) and thus can be used to avoid the malloc usage issues.
>
> Resulting performance is slower due the usage of qsort, specially in the
> worst-case scenario (partialy or sorted arrays) and due the fact
> mergesort uses a slight improved swap operations.
>
> This change also renders the BZ#21719 fix unrequired (since it is meant
> to fix the sorted input performance degradation for mergesort). The
> manual is also updated to indicate the function is now async-cancel
> safe.
>
> Checked on x86_64-linux-gnu.
> ---
> include/stdlib.h | 2 -
> manual/argp.texi | 2 +-
> manual/locale.texi | 3 +-
> manual/search.texi | 7 +-
> stdlib/Makefile | 2 -
> stdlib/msort.c | 309 ---------------------------------------------
> stdlib/qsort.c | 14 +-
> 7 files changed, 16 insertions(+), 323 deletions(-)
> delete mode 100644 stdlib/msort.c
>
> diff --git a/include/stdlib.h b/include/stdlib.h
> index 0ed8271d9b..580da9be15 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -149,8 +149,6 @@ extern int __posix_openpt (int __oflag) attribute_hidden;
> extern int __add_to_environ (const char *name, const char *value,
> const char *combines, int replace)
> attribute_hidden;
> -extern void _quicksort (void *const pbase, size_t total_elems,
> - size_t size, __compar_d_fn_t cmp, void *arg);
>
> extern int __on_exit (void (*__func) (int __status, void *__arg), void *__arg);
>
> diff --git a/manual/argp.texi b/manual/argp.texi
> index 0023441812..b77ad68285 100644
> --- a/manual/argp.texi
> +++ b/manual/argp.texi
> @@ -735,7 +735,7 @@ for options, bad phase of the moon, etc.
> @c hol_set_group ok
> @c hol_find_entry ok
> @c hol_sort @mtslocale @acucorrupt
> -@c qsort dup @acucorrupt
> +@c qsort dup
> @c hol_entry_qcmp @mtslocale
> @c hol_entry_cmp @mtslocale
> @c group_cmp ok
> diff --git a/manual/locale.texi b/manual/locale.texi
> index 720e0ca952..f6afa5dc44 100644
> --- a/manual/locale.texi
> +++ b/manual/locale.texi
> @@ -253,7 +253,7 @@ The symbols in this section are defined in the header file @file{locale.h}.
> @c calculate_head_size ok
> @c __munmap ok
> @c compute_hashval ok
> -@c qsort dup @acucorrupt
> +@c qsort dup
> @c rangecmp ok
> @c malloc @ascuheap @acsmem
> @c strdup @ascuheap @acsmem
> @@ -275,7 +275,6 @@ The symbols in this section are defined in the header file @file{locale.h}.
> @c realloc @ascuheap @acsmem
> @c realloc @ascuheap @acsmem
> @c fclose @ascuheap @asulock @acsmem @acsfd @aculock
> -@c qsort @ascuheap @acsmem
> @c alias_compare dup
> @c libc_lock_unlock @aculock
> @c _nl_explode_name @ascuheap @acsmem
> diff --git a/manual/search.texi b/manual/search.texi
> index 5691bf2f2b..a550858478 100644
> --- a/manual/search.texi
> +++ b/manual/search.texi
> @@ -159,7 +159,7 @@ To sort an array using an arbitrary comparison function, use the
>
> @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
> @standards{ISO, stdlib.h}
> -@safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acucorrupt{}}}
> +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
> The @code{qsort} function sorts the array @var{array}. The array
> contains @var{count} elements, each of which is of size @var{size}.
>
> @@ -199,9 +199,8 @@ Functions}):
> The @code{qsort} function derives its name from the fact that it was
> originally implemented using the ``quick sort'' algorithm.
>
> -The implementation of @code{qsort} in this library might not be an
> -in-place sort and might thereby use an extra amount of memory to store
> -the array.
> +The implementation of @code{qsort} in this library is an in-place sort
> +and uses a constant extra space (allocated on the stack).
> @end deftypefun
>
> @node Search/Sort Example
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index 25e42a77e7..095518eef4 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -96,7 +96,6 @@ routines := \
> mbtowc \
> mrand48 \
> mrand48_r \
> - msort \
> nrand48 \
> nrand48_r \
> old_atexit \
> @@ -380,7 +379,6 @@ generated += \
> # generated
>
> CFLAGS-bsearch.c += $(uses-callbacks)
> -CFLAGS-msort.c += $(uses-callbacks)
> CFLAGS-qsort.c += $(uses-callbacks)
> CFLAGS-system.c += -fexceptions
> CFLAGS-system.os = -fomit-frame-pointer
> diff --git a/stdlib/msort.c b/stdlib/msort.c
> deleted file mode 100644
> index bbaa5e9f82..0000000000
> --- a/stdlib/msort.c
> +++ /dev/null
> @@ -1,309 +0,0 @@
> -/* An alternative to qsort, with an identical interface.
> - This file is part of the GNU C Library.
> - Copyright (C) 1992-2023 Free Software Foundation, Inc.
> -
> - The GNU C Library is free software; you can redistribute it and/or
> - modify it under the terms of the GNU Lesser General Public
> - License as published by the Free Software Foundation; either
> - version 2.1 of the License, or (at your option) any later version.
> -
> - The GNU C Library is distributed in the hope that it will be useful,
> - but WITHOUT ANY WARRANTY; without even the implied warranty of
> - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> - Lesser General Public License for more details.
> -
> - You should have received a copy of the GNU Lesser General Public
> - License along with the GNU C Library; if not, see
> - <https://www.gnu.org/licenses/>. */
> -
> -#include <alloca.h>
> -#include <stdint.h>
> -#include <stdlib.h>
> -#include <string.h>
> -#include <unistd.h>
> -#include <memcopy.h>
> -#include <errno.h>
> -#include <atomic.h>
> -
> -struct msort_param
> -{
> - size_t s;
> - size_t var;
> - __compar_d_fn_t cmp;
> - void *arg;
> - char *t;
> -};
> -static void msort_with_tmp (const struct msort_param *p, void *b, size_t n);
> -
> -static void
> -msort_with_tmp (const struct msort_param *p, void *b, size_t n)
> -{
> - char *b1, *b2;
> - size_t n1, n2;
> -
> - if (n <= 1)
> - return;
> -
> - n1 = n / 2;
> - n2 = n - n1;
> - b1 = b;
> - b2 = (char *) b + (n1 * p->s);
> -
> - msort_with_tmp (p, b1, n1);
> - msort_with_tmp (p, b2, n2);
> -
> - char *tmp = p->t;
> - const size_t s = p->s;
> - __compar_d_fn_t cmp = p->cmp;
> - void *arg = p->arg;
> - switch (p->var)
> - {
> - case 0:
> - while (n1 > 0 && n2 > 0)
> - {
> - if ((*cmp) (b1, b2, arg) <= 0)
> - {
> - *(uint32_t *) tmp = *(uint32_t *) b1;
> - b1 += sizeof (uint32_t);
> - --n1;
> - }
> - else
> - {
> - *(uint32_t *) tmp = *(uint32_t *) b2;
> - b2 += sizeof (uint32_t);
> - --n2;
> - }
> - tmp += sizeof (uint32_t);
> - }
> - break;
> - case 1:
> - while (n1 > 0 && n2 > 0)
> - {
> - if ((*cmp) (b1, b2, arg) <= 0)
> - {
> - *(uint64_t *) tmp = *(uint64_t *) b1;
> - b1 += sizeof (uint64_t);
> - --n1;
> - }
> - else
> - {
> - *(uint64_t *) tmp = *(uint64_t *) b2;
> - b2 += sizeof (uint64_t);
> - --n2;
> - }
> - tmp += sizeof (uint64_t);
> - }
> - break;
> - case 2:
> - while (n1 > 0 && n2 > 0)
> - {
> - unsigned long *tmpl = (unsigned long *) tmp;
> - unsigned long *bl;
> -
> - tmp += s;
> - if ((*cmp) (b1, b2, arg) <= 0)
> - {
> - bl = (unsigned long *) b1;
> - b1 += s;
> - --n1;
> - }
> - else
> - {
> - bl = (unsigned long *) b2;
> - b2 += s;
> - --n2;
> - }
> - while (tmpl < (unsigned long *) tmp)
> - *tmpl++ = *bl++;
> - }
> - break;
> - case 3:
> - while (n1 > 0 && n2 > 0)
> - {
> - if ((*cmp) (*(const void **) b1, *(const void **) b2, arg) <= 0)
> - {
> - *(void **) tmp = *(void **) b1;
> - b1 += sizeof (void *);
> - --n1;
> - }
> - else
> - {
> - *(void **) tmp = *(void **) b2;
> - b2 += sizeof (void *);
> - --n2;
> - }
> - tmp += sizeof (void *);
> - }
> - break;
> - default:
> - while (n1 > 0 && n2 > 0)
> - {
> - if ((*cmp) (b1, b2, arg) <= 0)
> - {
> - tmp = (char *) __mempcpy (tmp, b1, s);
> - b1 += s;
> - --n1;
> - }
> - else
> - {
> - tmp = (char *) __mempcpy (tmp, b2, s);
> - b2 += s;
> - --n2;
> - }
> - }
> - break;
> - }
> -
> - if (n1 > 0)
> - memcpy (tmp, b1, n1 * s);
> - memcpy (b, p->t, (n - n2) * s);
> -}
> -
> -
> -void
> -__qsort_r (void *b, size_t n, size_t s, __compar_d_fn_t cmp, void *arg)
> -{
> - size_t size = n * s;
> - char *tmp = NULL;
> - struct msort_param p;
> -
> - /* For large object sizes use indirect sorting. */
> - if (s > 32)
> - size = 2 * n * sizeof (void *) + s;
> -
> - if (size < 1024)
> - /* The temporary array is small, so put it on the stack. */
> - p.t = __alloca (size);
> - else
> - {
> - /* We should avoid allocating too much memory since this might
> - have to be backed up by swap space. */
> - static long int phys_pages;
> - static int pagesize;
> -
> - if (pagesize == 0)
> - {
> - phys_pages = __sysconf (_SC_PHYS_PAGES);
> -
> - if (phys_pages == -1)
> - /* Error while determining the memory size. So let's
> - assume there is enough memory. Otherwise the
> - implementer should provide a complete implementation of
> - the `sysconf' function. */
> - phys_pages = (long int) (~0ul >> 1);
> -
> - /* The following determines that we will never use more than
> - a quarter of the physical memory. */
> - phys_pages /= 4;
> -
> - /* Make sure phys_pages is written to memory. */
> - atomic_write_barrier ();
> -
> - pagesize = __sysconf (_SC_PAGESIZE);
> - }
> -
> - /* Just a comment here. We cannot compute
> - phys_pages * pagesize
> - and compare the needed amount of memory against this value.
> - The problem is that some systems might have more physical
> - memory then can be represented with a `size_t' value (when
> - measured in bytes. */
> -
> - /* If the memory requirements are too high don't allocate memory. */
> - if (size / pagesize > (size_t) phys_pages)
> - {
> - _quicksort (b, n, s, cmp, arg);
> - return;
> - }
> -
> - /* It's somewhat large, so malloc it. */
> - int save = errno;
> - tmp = malloc (size);
> - __set_errno (save);
> - if (tmp == NULL)
> - {
> - /* Couldn't get space, so use the slower algorithm
> - that doesn't need a temporary array. */
> - _quicksort (b, n, s, cmp, arg);
> - return;
> - }
> - p.t = tmp;
> - }
> -
> - p.s = s;
> - p.var = 4;
> - p.cmp = cmp;
> - p.arg = arg;
> -
> - if (s > 32)
> - {
> - /* Indirect sorting. */
> - char *ip = (char *) b;
> - void **tp = (void **) (p.t + n * sizeof (void *));
> - void **t = tp;
> - void *tmp_storage = (void *) (tp + n);
> -
> - while ((void *) t < tmp_storage)
> - {
> - *t++ = ip;
> - ip += s;
> - }
> - p.s = sizeof (void *);
> - p.var = 3;
> - msort_with_tmp (&p, p.t + n * sizeof (void *), n);
> -
> - /* tp[0] .. tp[n - 1] is now sorted, copy around entries of
> - the original array. Knuth vol. 3 (2nd ed.) exercise 5.2-10. */
> - char *kp;
> - size_t i;
> - for (i = 0, ip = (char *) b; i < n; i++, ip += s)
> - if ((kp = tp[i]) != ip)
> - {
> - size_t j = i;
> - char *jp = ip;
> - memcpy (tmp_storage, ip, s);
> -
> - do
> - {
> - size_t k = (kp - (char *) b) / s;
> - tp[j] = jp;
> - memcpy (jp, kp, s);
> - j = k;
> - jp = kp;
> - kp = tp[k];
> - }
> - while (kp != ip);
> -
> - tp[j] = jp;
> - memcpy (jp, tmp_storage, s);
> - }
> - }
> - else
> - {
> - if ((s & (sizeof (uint32_t) - 1)) == 0
> - && ((uintptr_t) b) % __alignof__ (uint32_t) == 0)
> - {
> - if (s == sizeof (uint32_t))
> - p.var = 0;
> - else if (s == sizeof (uint64_t)
> - && ((uintptr_t) b) % __alignof__ (uint64_t) == 0)
> - p.var = 1;
> - else if ((s & (sizeof (unsigned long) - 1)) == 0
> - && ((uintptr_t) b)
> - % __alignof__ (unsigned long) == 0)
> - p.var = 2;
> - }
> - msort_with_tmp (&p, b, n);
> - }
> - free (tmp);
> -}
> -libc_hidden_def (__qsort_r)
> -weak_alias (__qsort_r, qsort_r)
> -
> -
> -void
> -qsort (void *b, size_t n, size_t s, __compar_fn_t cmp)
> -{
> - return __qsort_r (b, n, s, (__compar_d_fn_t) cmp, NULL);
> -}
> -libc_hidden_def (qsort)
> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> index d5f205affc..fd32a165e7 100644
> --- a/stdlib/qsort.c
> +++ b/stdlib/qsort.c
> @@ -19,7 +19,6 @@
> Engineering a sort function; Jon Bentley and M. Douglas McIlroy;
> Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. */
>
> -#include <alloca.h>
> #include <limits.h>
> #include <memswap.h>
> #include <stdlib.h>
> @@ -265,8 +264,8 @@ insertion_sort_qsort_partitions (void *const pbase, size_t total_elems,
> stack size is needed (actually O(1) in this case)! */
>
> void
> -_quicksort (void *const pbase, size_t total_elems, size_t size,
> - __compar_d_fn_t cmp, void *arg)
> +__qsort_r (void *const pbase, size_t total_elems, size_t size,
> + __compar_d_fn_t cmp, void *arg)
> {
> char *base_ptr = (char *) pbase;
>
> @@ -398,3 +397,12 @@ _quicksort (void *const pbase, size_t total_elems, size_t size,
> insertion_sort_qsort_partitions (pbase, total_elems, size, swap_type, cmp,
> arg);
> }
> +libc_hidden_def (__qsort_r)
> +weak_alias (__qsort_r, qsort_r)
> +
> +void
> +qsort (void *b, size_t n, size_t s, __compar_fn_t cmp)
> +{
> + return __qsort_r (b, n, s, (__compar_d_fn_t) cmp, NULL);
> +}
> +libc_hidden_def (qsort)
> --
> 2.34.1
>
LGTM
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
next prev parent reply other threads:[~2023-10-30 19:04 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 12:22 [PATCH v8 0/7] Use introsort for qsort Adhemerval Zanella
2023-10-03 12:22 ` [PATCH v8 1/7] string: Add internal memswap implementation Adhemerval Zanella
2023-10-03 17:42 ` Noah Goldstein
2023-10-27 20:23 ` Adhemerval Zanella Netto
2023-10-27 22:47 ` Gabriel Ravier
2023-10-28 1:30 ` Noah Goldstein
2023-10-30 19:02 ` Noah Goldstein
2023-10-03 12:22 ` [PATCH v8 2/7] stdlib: Optimization qsort{_r} swap implementation Adhemerval Zanella
2023-10-30 19:02 ` Noah Goldstein
2023-10-03 12:22 ` [PATCH v8 3/7] stdlib: Move insertion sort out qsort Adhemerval Zanella
2023-10-30 19:03 ` Noah Goldstein
2023-10-03 12:22 ` [PATCH v8 4/7] stdlib: qsort: Move some macros to inline function Adhemerval Zanella
2023-10-30 19:03 ` Noah Goldstein
2023-10-03 12:22 ` [PATCH v8 5/7] stdlib: Implement introsort for qsort (BZ 19305) Adhemerval Zanella
2023-10-30 19:03 ` Noah Goldstein
2023-10-03 12:22 ` [PATCH v8 6/7] stdlib: Remove use of mergesort on qsort (BZ 21719) Adhemerval Zanella
2023-10-30 19:04 ` Noah Goldstein [this message]
2023-10-03 12:22 ` [PATCH v8 7/7] stdlib: Add more qsort{_r} coverage Adhemerval Zanella
2023-10-30 19:04 ` Noah Goldstein
2023-10-03 18:13 ` [PATCH v8 0/7] Use introsort for qsort Noah Goldstein
2023-10-27 20:24 ` Adhemerval Zanella Netto
2023-10-28 1:31 ` Noah Goldstein
-- strict thread matches above, loose matches on Subject: below --
2023-10-02 19:33 Adhemerval Zanella
2023-10-02 19:33 ` [PATCH v8 6/7] stdlib: Remove use of mergesort on qsort (BZ 21719) Adhemerval Zanella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFUsyfL4kM9DnK1vFzJhr46YZud6aiXTskv6mT6afZZ=fTqTfQ@mail.gmail.com' \
--to=goldstein.w.n@gmail.com \
--cc=adhemerval.zanella@linaro.org \
--cc=eggert@cs.ucla.edu \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).