From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id C1CD53858CDB for ; Sat, 30 Dec 2023 19:59:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C1CD53858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C1CD53858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703966390; cv=none; b=p95+jgpqATzvv2+Okqi+WX0ziJdVHxJRtffFcn3dOATC7ow3l4Hx5jY4Q8IrDO+Ngs9ejLxIc1HRVtjs4sJNDDHy4Lmgowfcz58bQmEEe8QSgtNk5vPVfguXr/mOQzldgzDU22PWHse4rUZOzQz/8pVBKKnMY6I1Fji+VydIopg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703966390; c=relaxed/simple; bh=9SqNoyc8yqtrJwmS/m4Cc2KpOMD8kiWkvm22vOFtsNc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=tVcIyufEC/7NG6MEjP2wGXgaC5Kr8rHjS/eLhIuD/NMwsqJ3EEtIwlYDuYZZC9eYq8RBzDmjQBg/WB3FQYIl9qFwaGJRurPvIDbx1NaT1fQ5RmKGq1nH4dGdA1JRyTWqOTuRir7GntxhferiyvB8Pm6jOUBDpse339dISB0BdP8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1d3efb6bfa3so17599265ad.0 for ; Sat, 30 Dec 2023 11:59:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703966386; x=1704571186; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+AIi9/aTcQ9HcHWizN/S3L8cyoz13XOGHZGVGOGMvpo=; b=bPD6nYTbVH2u/Gs7TiwnpSTDLAohEHCj0MbGhe0JzAk9SgoFRmzL9Hx3LcNfZDWu+b d2guBkWTSkoSldwcG19lyXIJcj4rP2XtHU2rCLH0+P+5RZLq5d3wXGg595jkIFt28U6P RecXSssfDsD1eQlPvn23lmDqXfyQlbpgFyd2UCSHQFWRIWAgu+MityIJ5G0h0J5ktu3L Jnf2CTk0nnlb7CKYYXwb/Rifrf2v4QWm6Vn+8MJ7JHTQPtjBKCDsMaBWjf/o6CkxHLlz ZtiF0MOjYGjS3O2bwEEwRKPlKOPcGlQoq+hscqJ2SuNL9kbOncBVKks64XRduUsIOjoH BgqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703966386; x=1704571186; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+AIi9/aTcQ9HcHWizN/S3L8cyoz13XOGHZGVGOGMvpo=; b=J6W+0wUOZtVp1d30wCnSO3Vv6gitKeoJTEGunLqvKpJe0Z2MATcuRxfl+EloOExAbC GS1OhCwBxvh8dx1CwNppIS0el5e38BJDXNOrVTqoDByFCT4ZRozXyhnEhcNVwCoW2YGd 5LiFW7FJVkXp7jIpd2Wsu2yzkfMqEFrh8JxPFXEKaVpNj5dcuET1QH7Mq5uy6MklGQ8C pj6WEI4VZOR/rLDw++2xkjadY6Ebvz/vhutlMi7rjRvkihhaQraJ9ea9OORM6UYdVy0L SENzcB1HfWAWzs1KMUItwJe9EGz7PGHHDVsoYATPjPK9rR0+OdKsz21OT3qMOQr/a6gi aplQ== X-Gm-Message-State: AOJu0YwGqjM82va0Bn2WtSwtee8mUqBeH4qOmGXEdhhmBPybOohEdT6A YnzTSdsjstyznaeLVFGdfgOkHxbGpj0= X-Google-Smtp-Source: AGHT+IGGa/PYIrTlckGskJXd+jCR/KWAemaA71SdRWqgs7Cadnd1LVemKRZCNrVuCqh2/WmxX3Gi+w== X-Received: by 2002:a17:902:ce83:b0:1d3:c3d4:86a2 with SMTP id f3-20020a170902ce8300b001d3c3d486a2mr26927127plg.2.1703966386090; Sat, 30 Dec 2023 11:59:46 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id k1-20020a170902ba8100b001d2ed17751asm17626780pls.261.2023.12.30.11.59.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 30 Dec 2023 11:59:45 -0800 (PST) From: Kuan-Wei Chiu To: newlib@sourceware.org Cc: Kuan-Wei Chiu Subject: [PATCH] Implement introsort for qsort Date: Sun, 31 Dec 2023 03:59:40 +0800 Message-Id: <20231230195940.1989559-1-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Enhances the qsort implementation by introducing introsort to address the worst-case time complexity of O(n^2) associated with quicksort. Introsort is utilized to switch to heapsort when quicksort recursion depth becomes excessive, ensuring a worst-case time complexity of O(n log n). The heapsort implementation adopts a bottom-up approach, significantly reducing the number of required comparisons and enhancing overall efficiency. Refs: Introspective Sorting and Selection Algorithms David R. Musser Software—Practice & Experience, 27(8); Pages 983–993, Aug 1997 https://dl.acm.org/doi/10.5555/261387.261395 A killer adversary for quicksort M. D. McIlroy Software—Practice & Experience, 29(4); Pages 341–344, 10 April 1999 https://dl.acm.org/doi/10.5555/311868.311871 BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://dl.acm.org/doi/10.5555/162625.162643 Signed-off-by: Kuan-Wei Chiu --- To assess the performance of the new introsort and the old quicksort in worst-case scenarios, we examined the number of comparisons required for sorting based on the paper "A killer adversary for quicksort." Before the patch: n = 1000, cmp_count = 100853 n = 2000, cmp_count = 395991 n = 3000, cmp_count = 885617 n = 4000, cmp_count = 1569692 n = 5000, cmp_count = 2448183 n = 6000, cmp_count = 3521140 n = 7000, cmp_count = 4788493 n = 8000, cmp_count = 6250342 n = 9000, cmp_count = 7906610 n = 10000, cmp_count = 9757353 After the patch: n = 1000, cmp_count = 27094 n = 2000, cmp_count = 61500 n = 3000, cmp_count = 100529 n = 4000, cmp_count = 136538 n = 5000, cmp_count = 182698 n = 6000, cmp_count = 221120 n = 7000, cmp_count = 259933 n = 8000, cmp_count = 299058 n = 9000, cmp_count = 356405 n = 10000, cmp_count = 397723 newlib/libc/search/qsort.c | 153 +++++++++++++++++++++++++++++++++++-- 1 file changed, 147 insertions(+), 6 deletions(-) diff --git a/newlib/libc/search/qsort.c b/newlib/libc/search/qsort.c index b53400aa8..61db73746 100644 --- a/newlib/libc/search/qsort.c +++ b/newlib/libc/search/qsort.c @@ -65,6 +65,7 @@ PORTABILITY #include <_ansi.h> #include #include +#include #ifndef __GNUC__ #define inline @@ -145,6 +146,130 @@ __unused :(CMP(thunk, b, c) > 0 ? b : (CMP(thunk, a, c) < 0 ? a : c )); } +/* + * Sifts the element at index 'i' down into the heap. + * + * This variant of siftdown follows a bottom-up approach, reducing the overall + * number of comparisons. The algorithm identifies the path for the element to + * sift down, reaching the leaves with only one comparison per level. + * Subsequently, it backtracks to determine the appropriate position to insert + * the target element. + * + * As elements tend to sift down closer to the leaves, this approach minimizes + * the number of comparisons compared to the traditional two per level descent. + * On average, it uses slightly more than half as many comparisons, with a + * worst-case scenario of 3/4th the comparisons. + */ +#if defined(I_AM_QSORT_R) +static inline void +__bsd_siftdown_r (void *a, + size_t i, + size_t n, + size_t es, + int swaptype, + void *thunk, + cmp_t *cmp) +#elif defined(I_AM_GNU_QSORT_R) +static inline void +siftdown_r (void *a, + size_t i, + size_t n, + size_t es, + int swaptype, + cmp_t *cmp, + void *thunk) +#else +#define thunk NULL +static inline void +siftdown (void *a, + size_t i, + size_t n, + size_t es, + int swaptype, + cmp_t *cmp) +#endif +{ + size_t j, k; + + /* Find the sift-down path all the way to the leaves. */ + for (j = i; k = 2 * j + 1, k + 1 < n;) + j = CMP(thunk, (char *) a + k * es, (char *) a + (k + 1) * es) > 0 ? k : (k + 1); + + /* Special case for the last leaf with no sibling. */ + if (k + 1 == n) + j = k; + + /* Backtrack to the correct location. */ + while (i != j && CMP(thunk, (char *) a + i * es, (char *) a + j * es) > 0) + j = (j - 1) / 2; + + /* Shift the element into its correct place. */ + k = j; + while (i != j) { + j = (j - 1) / 2; + swap((char *) a + j * es, (char *) a + k * es); + } +} + +/* + * Heapsort is employed to achieve O(n log n) worst-case time complexity and + * O(1) additional space complexity for introsort implementation. When + * quicksort recursion becomes too deep, switching to heapsort helps maintain + * efficiency. + */ +#if defined(I_AM_QSORT_R) +static void +__bsd_heapsort_r (void *a, + size_t n, + size_t es, + int swaptype, + void *thunk, + cmp_t *cmp) +#elif defined(I_AM_GNU_QSORT_R) +static void +heapsort_r (void *a, + size_t n, + size_t es, + int swaptype, + cmp_t *cmp, + void *thunk) +#else +#define thunk NULL +static void +heapsort (void *a, + size_t n, + size_t es, + int swaptype, + cmp_t *cmp) +#endif +{ + size_t i = n / 2; + + /* Build a max heap. */ + while(i) { + --i; +#if defined(I_AM_QSORT_R) + __bsd_siftdown_r(a, i, n, es, swaptype, thunk, cmp); +#elif defined(I_AM_GNU_QSORT_R) + siftdown_r(a, i, n, es, swaptype, cmp, thunk); +#else + siftdown(a, i, n, es, swaptype, cmp); +#endif + } + + /* Extract elements from the heap one by one. */ + for(i = n - 1; i; i--) { + swap((char *) a, (char *) a + i * es); +#if defined(I_AM_QSORT_R) + __bsd_siftdown_r(a, 0, i, es, swaptype, thunk, cmp); +#elif defined(I_AM_GNU_QSORT_R) + siftdown_r(a, 0, i, es, swaptype, cmp, thunk); +#else + siftdown(a, 0, i, es, swaptype, cmp); +#endif + } +} + /* * Classical function call recursion wastes a lot of stack space. Each * recursion level requires a full stack frame comprising all local variables @@ -189,7 +314,9 @@ qsort (void *a, int cmp_result; int swaptype, swap_cnt; size_t recursion_level = 0; - struct { void *a; size_t n; } parameter_stack[PARAMETER_STACK_LEVELS]; + size_t current_level = 0; + size_t max_level = 2 * (sizeof(size_t) * CHAR_BIT - 1 - __builtin_clzl(n)); + struct { void *a; size_t n; size_t level; } parameter_stack[PARAMETER_STACK_LEVELS]; SWAPINIT(a, es); loop: swap_cnt = 0; @@ -202,6 +329,18 @@ loop: swap_cnt = 0; goto pop; } + /* Switch to heapsort when the current level exceeds the maximum level. */ + if(++current_level > max_level) { +#if defined(I_AM_QSORT_R) + __bsd_heapsort_r(a, n, es, swaptype, thunk, cmp); +#elif defined(I_AM_GNU_QSORT_R) + heapsort_r(a, n, es, swaptype, thunk, cmp); +#else + heapsort(a, n, es, swaptype, cmp); +#endif + goto pop; + } + /* Select a pivot element, move it to the left. */ pm = (char *) a + (n / 2) * es; if (n > 7) { @@ -310,6 +449,7 @@ loop: swap_cnt = 0; */ parameter_stack[recursion_level].a = a; parameter_stack[recursion_level].n = n / es; + parameter_stack[recursion_level].level = current_level; recursion_level++; a = pa; n = r / es; @@ -318,15 +458,15 @@ loop: swap_cnt = 0; else { /* * The parameter_stack array is full. The smaller part - * is sorted using function call recursion. The larger - * part will be sorted after the function call returns. + * is sorted using heapsort. The larger part will be + * sorted after the function call returns. */ #if defined(I_AM_QSORT_R) - __bsd_qsort_r(pa, r / es, es, thunk, cmp); + __bsd_heapsort_r(pa, r / es, es, swaptype, thunk, cmp); #elif defined(I_AM_GNU_QSORT_R) - qsort_r(pa, r / es, es, cmp, thunk); + heapsort_r(pa, r / es, es, swaptype, cmp, thunk); #else - qsort(pa, r / es, es, cmp); + heapsort(pa, r / es, es, swaptype, cmp); #endif } } @@ -340,6 +480,7 @@ pop: recursion_level--; a = parameter_stack[recursion_level].a; n = parameter_stack[recursion_level].n; + current_level = parameter_stack[recursion_level].level; goto loop; } } -- 2.25.1