From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 161B03858D33 for ; Mon, 30 Oct 2023 19:04:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 161B03858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 161B03858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692679; cv=none; b=R773oQ/I6TohUWEiBNABmEeB5X/9bRQfPM77i7bQy5cW25fDO0//UMuS6K40Tf4+gek52WHvxBLU3h+E2PRS1KJWMLiM2ocenmdwOh/DKZP0IbjoKw0TIaTBWaHaYROyzBeyScplEhThCev3rpakySY/ZdU8VsHB8ZMrv3As7go= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692679; c=relaxed/simple; bh=/FolXNQRpziVumoSDN9Lhw3QxCOb51+rEZEKSe63rY0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=PlTZ4xDZJNrT2pmX2J+BIz5DcLNIxJHb49yYkVXDYWqCzuSmM0cmETOeKy1rbC9eB620rY8h52gtyAxd91A+wvbty0vcU4JqEJplwx3/eiAB547tMJQVfbHRnHddmPqePZZwoboX97atd/YTkmp1FVGSTvZr3GBkm+lCYuhXQOU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1e9a757e04eso2997844fac.0 for ; Mon, 30 Oct 2023 12:04:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698692676; x=1699297476; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hx5DQaEIMz+oq8tJ25glFl8O9AeDF5wbR0fewtQ2hl0=; b=Yfvs5koPBVRG7rCyus8UZMvEJZAKIsdcRgd37Z5DSkzQGjEvk1Uttr+pbI8sLQ/DWf tyKSAIZaKapY8p6xGtBiCXvisPz9b2gtYXGGEX9BcnvtsQdasKVzmiThDY3iEl/yxy0U dG6rkx0wVmLuZAaZMqEZyLcCUNFBnu7neoG3KlU4Bvbs2ai7YCgERFKsB12GtYpWHbbB B//JiM7ug93ErtrLR3b3Y4EMTkjAOdokKQSwy7OC3INgFEjE0HAUcG/CbWYfuFGdEHPn 4LzXbQCGdxA8UnEPwm57pfNSdLusu7xHZJTAvO6dOeZqEc3O5U5Kz+rpzi41sIrpOnjM 18FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698692676; x=1699297476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hx5DQaEIMz+oq8tJ25glFl8O9AeDF5wbR0fewtQ2hl0=; b=fblZ+v64lN+SfpacH4v4I13xfzHr+9WL6UKoo7Ib3lpxIHiWePb+bBmH9ILLjAw8bh wdHP5hPW66onyo0T3JImza5yJqp4WnTI8+Wp7Mq0E9mLbYox4BwW+PGahfDN348iYgLY LAwPpagHKytTmP6YXv4oHQzV7jfFyygPDRwH2XK7LLzliz3mJ6iGQdeBeMPV7US+sLuR M8+nBhi7rDmzuPBXPpQS9k78QLbCz3l0+XafG3fiVgAkj9wqusQsXuqI19KbftIoWZIm nTnZqfP0FG7xHLxVUzdRIcnrhvn5xRHUy1eUSYTdbxRfTwNpikVFSVi1aAXdxcaKCHHh nCsw== X-Gm-Message-State: AOJu0YyK3Xt05l535GEobEZdieoTaum8hK1IyWmXlcAhRpmXltV75RV7 zJ8kSV1ycGbZc1UR3mRC88KbBNRy//kqc21pPBs= X-Google-Smtp-Source: AGHT+IEshb4Z4DmKx1zYZBqp6rQ9hkVTFpxDk/VA+n/L7ioPYXDV5Xo2BHzWkxwXwukRYwUz74Z7CnMjaWeyyeMRV8k= X-Received: by 2002:a05:6870:9a02:b0:1e9:7912:3bd9 with SMTP id fo2-20020a0568709a0200b001e979123bd9mr309813oab.9.1698692676232; Mon, 30 Oct 2023 12:04:36 -0700 (PDT) MIME-Version: 1.0 References: <20231003122251.3325435-1-adhemerval.zanella@linaro.org> <20231003122251.3325435-7-adhemerval.zanella@linaro.org> In-Reply-To: <20231003122251.3325435-7-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Mon, 30 Oct 2023 14:04:24 -0500 Message-ID: Subject: Re: [PATCH v8 6/7] stdlib: Remove use of mergesort on qsort (BZ 21719) To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Paul Eggert , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 3, 2023 at 7:23=E2=80=AFAM Adhemerval Zanella wrote: > > This patch removes the mergesort optimization on qsort implementation > and uses the introsort instead. The mergesort implementation has some > issues: > > - It is as-safe only for certain types sizes (if total size is less > than 1 KB with large element sizes also forcing memory allocation) > which contradicts the function documentation. Although not required > by the C standard, it is preferable and doable to have an O(1) space > implementation. > > - The malloc for certain element size and element number adds > arbitrary latency (might even be worse if malloc is interposed). > > - To avoid trigger swap from memory allocation the implementation > relies on system information that might be virtualized (for instance > VMs with overcommit memory) which might lead to potentially use of > swap even if system advertise more memory than actually has. The > check also have the downside of issuing syscalls where none is > expected (although only once per execution). > > - The mergesort is suboptimal on an already sorted array (BZ#21719). > > The introsort implementation is already optimized to use constant extra > space (due to the limit of total number of elements from maximum VM > size) and thus can be used to avoid the malloc usage issues. > > Resulting performance is slower due the usage of qsort, specially in the > worst-case scenario (partialy or sorted arrays) and due the fact > mergesort uses a slight improved swap operations. > > This change also renders the BZ#21719 fix unrequired (since it is meant > to fix the sorted input performance degradation for mergesort). The > manual is also updated to indicate the function is now async-cancel > safe. > > Checked on x86_64-linux-gnu. > --- > include/stdlib.h | 2 - > manual/argp.texi | 2 +- > manual/locale.texi | 3 +- > manual/search.texi | 7 +- > stdlib/Makefile | 2 - > stdlib/msort.c | 309 --------------------------------------------- > stdlib/qsort.c | 14 +- > 7 files changed, 16 insertions(+), 323 deletions(-) > delete mode 100644 stdlib/msort.c > > diff --git a/include/stdlib.h b/include/stdlib.h > index 0ed8271d9b..580da9be15 100644 > --- a/include/stdlib.h > +++ b/include/stdlib.h > @@ -149,8 +149,6 @@ extern int __posix_openpt (int __oflag) attribute_hid= den; > extern int __add_to_environ (const char *name, const char *value, > const char *combines, int replace) > attribute_hidden; > -extern void _quicksort (void *const pbase, size_t total_elems, > - size_t size, __compar_d_fn_t cmp, void *arg); > > extern int __on_exit (void (*__func) (int __status, void *__arg), void *= __arg); > > diff --git a/manual/argp.texi b/manual/argp.texi > index 0023441812..b77ad68285 100644 > --- a/manual/argp.texi > +++ b/manual/argp.texi > @@ -735,7 +735,7 @@ for options, bad phase of the moon, etc. > @c hol_set_group ok > @c hol_find_entry ok > @c hol_sort @mtslocale @acucorrupt > -@c qsort dup @acucorrupt > +@c qsort dup > @c hol_entry_qcmp @mtslocale > @c hol_entry_cmp @mtslocale > @c group_cmp ok > diff --git a/manual/locale.texi b/manual/locale.texi > index 720e0ca952..f6afa5dc44 100644 > --- a/manual/locale.texi > +++ b/manual/locale.texi > @@ -253,7 +253,7 @@ The symbols in this section are defined in the header= file @file{locale.h}. > @c calculate_head_size ok > @c __munmap ok > @c compute_hashval ok > -@c qsort dup @acucorrupt > +@c qsort dup > @c rangecmp ok > @c malloc @ascuheap @acsmem > @c strdup @ascuheap @acsmem > @@ -275,7 +275,6 @@ The symbols in this section are defined in the header= file @file{locale.h}. > @c realloc @ascuheap @acsmem > @c realloc @ascuheap @acsmem > @c fclose @ascuheap @asulock @acsmem @acsfd @aculock > -@c qsort @ascuheap @acsmem > @c alias_compare dup > @c libc_lock_unlock @aculock > @c _nl_explode_name @ascuheap @acsmem > diff --git a/manual/search.texi b/manual/search.texi > index 5691bf2f2b..a550858478 100644 > --- a/manual/search.texi > +++ b/manual/search.texi > @@ -159,7 +159,7 @@ To sort an array using an arbitrary comparison functi= on, use the > > @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @v= ar{size}, comparison_fn_t @var{compare}) > @standards{ISO, stdlib.h} > -@safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acucorrupt{}}} > +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} > The @code{qsort} function sorts the array @var{array}. The array > contains @var{count} elements, each of which is of size @var{size}. > > @@ -199,9 +199,8 @@ Functions}): > The @code{qsort} function derives its name from the fact that it was > originally implemented using the ``quick sort'' algorithm. > > -The implementation of @code{qsort} in this library might not be an > -in-place sort and might thereby use an extra amount of memory to store > -the array. > +The implementation of @code{qsort} in this library is an in-place sort > +and uses a constant extra space (allocated on the stack). > @end deftypefun > > @node Search/Sort Example > diff --git a/stdlib/Makefile b/stdlib/Makefile > index 25e42a77e7..095518eef4 100644 > --- a/stdlib/Makefile > +++ b/stdlib/Makefile > @@ -96,7 +96,6 @@ routines :=3D \ > mbtowc \ > mrand48 \ > mrand48_r \ > - msort \ > nrand48 \ > nrand48_r \ > old_atexit \ > @@ -380,7 +379,6 @@ generated +=3D \ > # generated > > CFLAGS-bsearch.c +=3D $(uses-callbacks) > -CFLAGS-msort.c +=3D $(uses-callbacks) > CFLAGS-qsort.c +=3D $(uses-callbacks) > CFLAGS-system.c +=3D -fexceptions > CFLAGS-system.os =3D -fomit-frame-pointer > diff --git a/stdlib/msort.c b/stdlib/msort.c > deleted file mode 100644 > index bbaa5e9f82..0000000000 > --- a/stdlib/msort.c > +++ /dev/null > @@ -1,309 +0,0 @@ > -/* An alternative to qsort, with an identical interface. > - This file is part of the GNU C Library. > - Copyright (C) 1992-2023 Free Software Foundation, Inc. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - . */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -struct msort_param > -{ > - size_t s; > - size_t var; > - __compar_d_fn_t cmp; > - void *arg; > - char *t; > -}; > -static void msort_with_tmp (const struct msort_param *p, void *b, size_t= n); > - > -static void > -msort_with_tmp (const struct msort_param *p, void *b, size_t n) > -{ > - char *b1, *b2; > - size_t n1, n2; > - > - if (n <=3D 1) > - return; > - > - n1 =3D n / 2; > - n2 =3D n - n1; > - b1 =3D b; > - b2 =3D (char *) b + (n1 * p->s); > - > - msort_with_tmp (p, b1, n1); > - msort_with_tmp (p, b2, n2); > - > - char *tmp =3D p->t; > - const size_t s =3D p->s; > - __compar_d_fn_t cmp =3D p->cmp; > - void *arg =3D p->arg; > - switch (p->var) > - { > - case 0: > - while (n1 > 0 && n2 > 0) > - { > - if ((*cmp) (b1, b2, arg) <=3D 0) > - { > - *(uint32_t *) tmp =3D *(uint32_t *) b1; > - b1 +=3D sizeof (uint32_t); > - --n1; > - } > - else > - { > - *(uint32_t *) tmp =3D *(uint32_t *) b2; > - b2 +=3D sizeof (uint32_t); > - --n2; > - } > - tmp +=3D sizeof (uint32_t); > - } > - break; > - case 1: > - while (n1 > 0 && n2 > 0) > - { > - if ((*cmp) (b1, b2, arg) <=3D 0) > - { > - *(uint64_t *) tmp =3D *(uint64_t *) b1; > - b1 +=3D sizeof (uint64_t); > - --n1; > - } > - else > - { > - *(uint64_t *) tmp =3D *(uint64_t *) b2; > - b2 +=3D sizeof (uint64_t); > - --n2; > - } > - tmp +=3D sizeof (uint64_t); > - } > - break; > - case 2: > - while (n1 > 0 && n2 > 0) > - { > - unsigned long *tmpl =3D (unsigned long *) tmp; > - unsigned long *bl; > - > - tmp +=3D s; > - if ((*cmp) (b1, b2, arg) <=3D 0) > - { > - bl =3D (unsigned long *) b1; > - b1 +=3D s; > - --n1; > - } > - else > - { > - bl =3D (unsigned long *) b2; > - b2 +=3D s; > - --n2; > - } > - while (tmpl < (unsigned long *) tmp) > - *tmpl++ =3D *bl++; > - } > - break; > - case 3: > - while (n1 > 0 && n2 > 0) > - { > - if ((*cmp) (*(const void **) b1, *(const void **) b2, arg) <=3D= 0) > - { > - *(void **) tmp =3D *(void **) b1; > - b1 +=3D sizeof (void *); > - --n1; > - } > - else > - { > - *(void **) tmp =3D *(void **) b2; > - b2 +=3D sizeof (void *); > - --n2; > - } > - tmp +=3D sizeof (void *); > - } > - break; > - default: > - while (n1 > 0 && n2 > 0) > - { > - if ((*cmp) (b1, b2, arg) <=3D 0) > - { > - tmp =3D (char *) __mempcpy (tmp, b1, s); > - b1 +=3D s; > - --n1; > - } > - else > - { > - tmp =3D (char *) __mempcpy (tmp, b2, s); > - b2 +=3D s; > - --n2; > - } > - } > - break; > - } > - > - if (n1 > 0) > - memcpy (tmp, b1, n1 * s); > - memcpy (b, p->t, (n - n2) * s); > -} > - > - > -void > -__qsort_r (void *b, size_t n, size_t s, __compar_d_fn_t cmp, void *arg) > -{ > - size_t size =3D n * s; > - char *tmp =3D NULL; > - struct msort_param p; > - > - /* For large object sizes use indirect sorting. */ > - if (s > 32) > - size =3D 2 * n * sizeof (void *) + s; > - > - if (size < 1024) > - /* The temporary array is small, so put it on the stack. */ > - p.t =3D __alloca (size); > - else > - { > - /* We should avoid allocating too much memory since this might > - have to be backed up by swap space. */ > - static long int phys_pages; > - static int pagesize; > - > - if (pagesize =3D=3D 0) > - { > - phys_pages =3D __sysconf (_SC_PHYS_PAGES); > - > - if (phys_pages =3D=3D -1) > - /* Error while determining the memory size. So let's > - assume there is enough memory. Otherwise the > - implementer should provide a complete implementation of > - the `sysconf' function. */ > - phys_pages =3D (long int) (~0ul >> 1); > - > - /* The following determines that we will never use more than > - a quarter of the physical memory. */ > - phys_pages /=3D 4; > - > - /* Make sure phys_pages is written to memory. */ > - atomic_write_barrier (); > - > - pagesize =3D __sysconf (_SC_PAGESIZE); > - } > - > - /* Just a comment here. We cannot compute > - phys_pages * pagesize > - and compare the needed amount of memory against this value. > - The problem is that some systems might have more physical > - memory then can be represented with a `size_t' value (when > - measured in bytes. */ > - > - /* If the memory requirements are too high don't allocate memory. = */ > - if (size / pagesize > (size_t) phys_pages) > - { > - _quicksort (b, n, s, cmp, arg); > - return; > - } > - > - /* It's somewhat large, so malloc it. */ > - int save =3D errno; > - tmp =3D malloc (size); > - __set_errno (save); > - if (tmp =3D=3D NULL) > - { > - /* Couldn't get space, so use the slower algorithm > - that doesn't need a temporary array. */ > - _quicksort (b, n, s, cmp, arg); > - return; > - } > - p.t =3D tmp; > - } > - > - p.s =3D s; > - p.var =3D 4; > - p.cmp =3D cmp; > - p.arg =3D arg; > - > - if (s > 32) > - { > - /* Indirect sorting. */ > - char *ip =3D (char *) b; > - void **tp =3D (void **) (p.t + n * sizeof (void *)); > - void **t =3D tp; > - void *tmp_storage =3D (void *) (tp + n); > - > - while ((void *) t < tmp_storage) > - { > - *t++ =3D ip; > - ip +=3D s; > - } > - p.s =3D sizeof (void *); > - p.var =3D 3; > - msort_with_tmp (&p, p.t + n * sizeof (void *), n); > - > - /* tp[0] .. tp[n - 1] is now sorted, copy around entries of > - the original array. Knuth vol. 3 (2nd ed.) exercise 5.2-10. */ > - char *kp; > - size_t i; > - for (i =3D 0, ip =3D (char *) b; i < n; i++, ip +=3D s) > - if ((kp =3D tp[i]) !=3D ip) > - { > - size_t j =3D i; > - char *jp =3D ip; > - memcpy (tmp_storage, ip, s); > - > - do > - { > - size_t k =3D (kp - (char *) b) / s; > - tp[j] =3D jp; > - memcpy (jp, kp, s); > - j =3D k; > - jp =3D kp; > - kp =3D tp[k]; > - } > - while (kp !=3D ip); > - > - tp[j] =3D jp; > - memcpy (jp, tmp_storage, s); > - } > - } > - else > - { > - if ((s & (sizeof (uint32_t) - 1)) =3D=3D 0 > - && ((uintptr_t) b) % __alignof__ (uint32_t) =3D=3D 0) > - { > - if (s =3D=3D sizeof (uint32_t)) > - p.var =3D 0; > - else if (s =3D=3D sizeof (uint64_t) > - && ((uintptr_t) b) % __alignof__ (uint64_t) =3D=3D 0) > - p.var =3D 1; > - else if ((s & (sizeof (unsigned long) - 1)) =3D=3D 0 > - && ((uintptr_t) b) > - % __alignof__ (unsigned long) =3D=3D 0) > - p.var =3D 2; > - } > - msort_with_tmp (&p, b, n); > - } > - free (tmp); > -} > -libc_hidden_def (__qsort_r) > -weak_alias (__qsort_r, qsort_r) > - > - > -void > -qsort (void *b, size_t n, size_t s, __compar_fn_t cmp) > -{ > - return __qsort_r (b, n, s, (__compar_d_fn_t) cmp, NULL); > -} > -libc_hidden_def (qsort) > diff --git a/stdlib/qsort.c b/stdlib/qsort.c > index d5f205affc..fd32a165e7 100644 > --- a/stdlib/qsort.c > +++ b/stdlib/qsort.c > @@ -19,7 +19,6 @@ > Engineering a sort function; Jon Bentley and M. Douglas McIlroy; > Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. *= / > > -#include > #include > #include > #include > @@ -265,8 +264,8 @@ insertion_sort_qsort_partitions (void *const pbase, s= ize_t total_elems, > stack size is needed (actually O(1) in this case)! */ > > void > -_quicksort (void *const pbase, size_t total_elems, size_t size, > - __compar_d_fn_t cmp, void *arg) > +__qsort_r (void *const pbase, size_t total_elems, size_t size, > + __compar_d_fn_t cmp, void *arg) > { > char *base_ptr =3D (char *) pbase; > > @@ -398,3 +397,12 @@ _quicksort (void *const pbase, size_t total_elems, s= ize_t size, > insertion_sort_qsort_partitions (pbase, total_elems, size, swap_type, = cmp, > arg); > } > +libc_hidden_def (__qsort_r) > +weak_alias (__qsort_r, qsort_r) > + > +void > +qsort (void *b, size_t n, size_t s, __compar_fn_t cmp) > +{ > + return __qsort_r (b, n, s, (__compar_d_fn_t) cmp, NULL); > +} > +libc_hidden_def (qsort) > -- > 2.34.1 > LGTM Reviewed-by: Noah Goldstein