From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by sourceware.org (Postfix) with ESMTPS id 7A268385559A for ; Mon, 30 Oct 2023 19:04:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A268385559A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7A268385559A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::30 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692651; cv=none; b=mTMoCgaP5fTbkk9HV6OvtX/TQ62LKK8mUiswzzV8AmF47VlxtNGAj+dEiQAP7jAPb9AJFdBGSlBdviI3dynmTdt/6/uMxYWmXzXM5wT/qaTkFMmD563h2j9svmNDRwF4eXmMf/DyvtjSJg7txpbfwIyfRDUuWYBki4jNQOOd5vI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698692651; c=relaxed/simple; bh=sDubBKcby33EgpNR5t6dxTvAuH3pjOYE6o+UQix+fF4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=ELQ9EymeVjDqGCp8cUKHk4GOsltmzUEtQGhDUbJOqpmrVJy16WsMTXgqx1SuMSTllSthXSC2EZzdcVfVn1xwmy+w2zQyEX6v044A5vt7igZKpRNfMxDu8Hw9j2JahKiLhvXT2MbxDNitB+3XI2v5ztzWKBlV2y1rNwzF8R2beVQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-1e10ba12fd3so3206120fac.1 for ; Mon, 30 Oct 2023 12:04:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698692650; x=1699297450; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LBvIwHjWLom9d4eJVcdXrNFmBmPxrJu9AeJNCgA7KXg=; b=alKDoT/kF0GXj2yVlApJeYRrnpAEFYphouugiZHaNDLS5NVnj6o9Nd/0TJc49umT3+ nBzVIbKk4fENur17me+hDtNitZOROjc5UcRVgZC4UOse9AzCDGgP5pxnpikNz6w/gg/b RcCWo49i425xxATwU4GgSyrM5E4H6CP5Xgy1m0uKRQANmF/s+I1vUXBOHQU88PMpX/BU X9gwOAgLJSo9l9q2znFRWh4+aJ7De5m3cRC1DKl4bx168HSH+CPcTtCJgjntmz2iDw7v RG0Sd0Q+2cfZMz/zsBeq0yCJvFpKpGPKgPEy4OwQLbaccL9HVsykWkF8nYTAtMcfx9AH zVUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698692650; x=1699297450; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LBvIwHjWLom9d4eJVcdXrNFmBmPxrJu9AeJNCgA7KXg=; b=TMO7wfcftdEdC+75d0JqoG057ZdVF1t5hvR7sI5t/ohwrceAw1L4dMR7gU5Les7yJv CqDCgh/3apvfHXaAL0EUnQX3o21irpekK05YR4ngvL7G1ynhTDu4b9kVdgk478r+p9wx jvxGj19e79bM3XGjFWyE6SgHkAjvlzO6k3gbAZgXTuFzp6cOFTbTUc+gjcooknKg1rlq lp6qKZ5ABzPi/too+mfoUordbZ+dw+La64s3GBAFoHJiA54Up/nYkMNQnkBKPbj9cRrp /OSqxDniNUGrEVnuiVXhMkMu5A9YwUmdzJiYVj0WAOexLoTQL1ZEuvi3tcnQjlK2yvtJ WENA== X-Gm-Message-State: AOJu0YwGAH5niUFIXSLYtcO7p5818jBonUO6fDpT6aUOYntW0GVfOfXz QkUksWT7cs/N1YilorGt8+MK2yB0R1bp8Rkm4nw= X-Google-Smtp-Source: AGHT+IEQMP2qX/YEOvSyCX3ngF94f4iz4WT0W3OOYNVERtR2SQV2Kh8h4orzzZBpMfm35G7kCW+Upw+4xk5bn0ehlL8= X-Received: by 2002:a05:6870:d88b:b0:1d5:a85a:13b6 with SMTP id oe11-20020a056870d88b00b001d5a85a13b6mr12221821oac.45.1698692649867; Mon, 30 Oct 2023 12:04:09 -0700 (PDT) MIME-Version: 1.0 References: <20231003122251.3325435-1-adhemerval.zanella@linaro.org> <20231003122251.3325435-6-adhemerval.zanella@linaro.org> In-Reply-To: <20231003122251.3325435-6-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Mon, 30 Oct 2023 14:03:57 -0500 Message-ID: Subject: Re: [PATCH v8 5/7] stdlib: Implement introsort for qsort (BZ 19305) To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Paul Eggert , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 3, 2023 at 7:23=E2=80=AFAM Adhemerval Zanella wrote: > > This patch makes the quicksort implementation to acts as introsort, to > avoid worse-case performance (and thus making it O(nlog n)). It switch > to heapsort when the depth level reaches 2*log2(total elements). The > heapsort is a textbook implementation. > > Checked on x86_64-linux-gnu and aarch64-linux-gnu. > --- > stdlib/qsort.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 82 insertions(+), 7 deletions(-) > > diff --git a/stdlib/qsort.c b/stdlib/qsort.c > index 80706b3357..d5f205affc 100644 > --- a/stdlib/qsort.c > +++ b/stdlib/qsort.c > @@ -98,6 +98,7 @@ typedef struct > { > char *lo; > char *hi; > + size_t depth; > } stack_node; > > /* The stack needs log (total_elements) entries (we could even subtract > @@ -107,22 +108,85 @@ typedef struct > enum { STACK_SIZE =3D CHAR_BIT * sizeof (size_t) }; > > static inline stack_node * > -push (stack_node *top, char *lo, char *hi) > +push (stack_node *top, char *lo, char *hi, size_t depth) > { > top->lo =3D lo; > top->hi =3D hi; > + top->depth =3D depth; > return ++top; > } > > static inline stack_node * > -pop (stack_node *top, char **lo, char **hi) > +pop (stack_node *top, char **lo, char **hi, size_t *depth) > { > --top; > *lo =3D top->lo; > *hi =3D top->hi; > + *depth =3D top->depth; > return top; > } > > +/* NB: N is inclusive bound for BASE. */ > +static inline void > +siftdown (void *base, size_t size, size_t k, size_t n, > + enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg) > +{ > + while (k <=3D n / 2) > + { > + size_t j =3D 2 * k; > + if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg)= < 0) > + j++; > + > + if (cmp (base + (k * size), base + (j * size), arg) >=3D 0) > + break; > + > + do_swap (base + (size * j), base + (k * size), size, swap_type); > + k =3D j; > + } > +} > + > +static inline void > +heapify (void *base, size_t size, size_t n, enum swap_type_t swap_type, > + __compar_d_fn_t cmp, void *arg) > +{ > + size_t k =3D n / 2; > + while (1) > + { > + siftdown (base, size, k, n, swap_type, cmp, arg); > + if (k-- =3D=3D 0) > + break; > + } > +} > + > +/* A non-recursive heapsort, used on introsort implementation as a fallb= ack > + routine with worst-case performance of O(nlog n) and worst-case space > + complexity of O(1). It sorts the array starting at BASE and ending a= t > + END, with each element of SIZE bytes. The SWAP_TYPE is the callback > + function used to swap elements, and CMP is the function used to compa= re > + elements. */ > +static void > +heapsort_r (void *base, void *end, size_t size, enum swap_type_t swap_ty= pe, > + __compar_d_fn_t cmp, void *arg) > +{ > + const size_t count =3D ((uintptr_t) end - (uintptr_t) base) / size; > + > + if (count < 2) > + return; > + > + size_t n =3D count - 1; > + > + /* Build the binary heap, largest value at the base[0]. */ > + heapify (base, size, n, swap_type, cmp, arg); > + > + /* On each iteration base[0:n] is the binary heap, while base[n:count] > + is sorted. */ > + while (n > 0) > + { > + do_swap (base, base + (n * size), size, swap_type); > + n--; > + siftdown (base, size, 0, n, swap_type, cmp, arg); > + } > +} > > static inline void > insertion_sort_qsort_partitions (void *const pbase, size_t total_elems, > @@ -208,7 +272,7 @@ _quicksort (void *const pbase, size_t total_elems, si= ze_t size, > > const size_t max_thresh =3D MAX_THRESH * size; > > - if (total_elems =3D=3D 0) > + if (total_elems <=3D 1) > /* Avoid lossage with unsigned arithmetic below. */ > return; > > @@ -220,15 +284,26 @@ _quicksort (void *const pbase, size_t total_elems, = size_t size, > else > swap_type =3D SWAP_BYTES; > > + /* Maximum depth before quicksort switches to heapsort. */ > + size_t depth =3D 2 * (sizeof (size_t) * CHAR_BIT - 1 > + - __builtin_clzl (total_elems)); > + > if (total_elems > MAX_THRESH) > { > char *lo =3D base_ptr; > char *hi =3D &lo[size * (total_elems - 1)]; > stack_node stack[STACK_SIZE]; > - stack_node *top =3D stack + 1; > + stack_node *top =3D push (stack, NULL, NULL, depth); > > while (stack < top) > { > + if (depth =3D=3D 0) > + { > + heapsort_r (lo, hi, size, swap_type, cmp, arg); > + top =3D pop (top, &lo, &hi, &depth); > + continue; > + } > + > char *left_ptr; > char *right_ptr; > > @@ -292,7 +367,7 @@ _quicksort (void *const pbase, size_t total_elems, si= ze_t size, > { > if ((size_t) (hi - left_ptr) <=3D max_thresh) > /* Ignore both small partitions. */ > - top =3D pop (top, &lo, &hi); > + top =3D pop (top, &lo, &hi, &depth); > else > /* Ignore small left partition. */ > lo =3D left_ptr; > @@ -303,13 +378,13 @@ _quicksort (void *const pbase, size_t total_elems, = size_t size, > else if ((right_ptr - lo) > (hi - left_ptr)) > { > /* Push larger left partition indices. */ > - top =3D push (top, lo, right_ptr); > + top =3D push (top, lo, right_ptr, depth - 1); > lo =3D left_ptr; > } > else > { > /* Push larger right partition indices. */ > - top =3D push (top, left_ptr, hi); > + top =3D push (top, left_ptr, hi, depth - 1); > hi =3D right_ptr; > } > } > -- > 2.34.1 > LGTM Reviewed-by: Noah Goldstein