public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] stdlib: Optimize number of calls to comparison function
@ 2023-12-02 21:48 Kuan-Wei Chiu
  2023-12-04  8:20 ` Florian Weimer
  2024-02-16  7:08 ` [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
  0 siblings, 2 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-02 21:48 UTC (permalink / raw)
  To: libc-alpha; +Cc: adhemerval.zanella, goldstein.w.n, Kuan-Wei Chiu

The current heapsort implementation in the siftdown function follows
the standard textbook version, necessitating two comparisons at each
level. Transitioning to the Bottom-up heapsort version allows us to
decrease the required comparisons to just one per level. On average,
this modification significantly reduces the comparison count from
2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).

Refs:
  BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
  QUICKSORT (if n is not very small)
  Ingo Wegener
  Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
  https://doi.org/10.1016/0304-3975(93)90364-Y

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 stdlib/qsort.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/stdlib/qsort.c b/stdlib/qsort.c
index be01fb5598..f5babef150 100644
--- a/stdlib/qsort.c
+++ b/stdlib/qsort.c
@@ -132,26 +132,26 @@ static inline void
 siftdown (void *base, size_t size, size_t k, size_t n,
 	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
 {
-  /* There can only be a heap condition violation if there are
-     children.  */
-  while (2 * k + 1 <= n)
-    {
-      /* Left child.  */
-      size_t j = 2 * k + 1;
-      /* If the right child is larger, use it.  */
-      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
-	j++;
-
-      /* If k is already >= to its children, we are done.  */
-      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
-	break;
+  size_t i, j;
+
+  /* Find the sift-down path all the way to the leaves. */
+  for (i = k; j = 2 * i + 1, j + 1 <= n;)
+    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
 
-      /* Heal the violation.  */
-      do_swap (base + (size * j), base + (k * size), size, swap_type);
+  /* Special case for the last leaf with no sibling. */
+  if (j == n)
+    i = j;
 
-      /* Swapping with j may have introduced a violation at j.  Fix
-	 it in the next loop iteration.  */
-      k = j;
+  /* Backtrack to the correct location. */
+  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
+    i = (i - 1) / 2;
+
+  /* Shift the element into its correct place. */
+  j = i;
+  while (i != k)
+    {
+      i = (i - 1) / 2;
+      do_swap (base + i * size, base + j * size, size, swap_type);
     }
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-02 21:48 [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
@ 2023-12-04  8:20 ` Florian Weimer
  2023-12-04 18:31   ` Kuan-Wei Chiu
  2023-12-05  3:22   ` [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort Kuan-Wei Chiu
  2024-02-16  7:08 ` [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
  1 sibling, 2 replies; 22+ messages in thread
From: Florian Weimer @ 2023-12-04  8:20 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n

* Kuan-Wei Chiu:

> The current heapsort implementation in the siftdown function follows
> the standard textbook version, necessitating two comparisons at each
> level. Transitioning to the Bottom-up heapsort version allows us to
> decrease the required comparisons to just one per level. On average,
> this modification significantly reduces the comparison count from
> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).

I think the factor in stdlib/tst-qsort5.c needs to be adjusted:

  /* This is an arbitrary factor which is true for the current
     implementation across a wide range of sizes.  */
  TEST_VERIFY (factor <= 4.5);

On the other hand, the heapsort code is not really expected to run at
all in the current implementation because median-of-three quicksort
usually avoids degenerating behavior.  If this heapsort variant uses a
number of comparisons that is competitive to quicksort, maybe we should
use it instead?  And use insertion sort for short arrays only.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-04  8:20 ` Florian Weimer
@ 2023-12-04 18:31   ` Kuan-Wei Chiu
  2023-12-05 10:44     ` Florian Weimer
  2023-12-05 20:35     ` Zack Weinberg
  2023-12-05  3:22   ` [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort Kuan-Wei Chiu
  1 sibling, 2 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-04 18:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n

On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
> 
>   /* This is an arbitrary factor which is true for the current
>      implementation across a wide range of sizes.  */
>   TEST_VERIFY (factor <= 4.5);
>

It seems that the factor can be adjusted to around 3.5. I can send
another patch for this adjustment or resend it as a patch series.

> On the other hand, the heapsort code is not really expected to run at
> all in the current implementation because median-of-three quicksort
> usually avoids degenerating behavior.  If this heapsort variant uses a
> number of comparisons that is competitive to quicksort, maybe we should
> use it instead?  And use insertion sort for short arrays only.
> 

This patch currently reduces the comparison count only in scenarios
where deep recursion results in a fallback to heapsort. According to
the referenced paper, beyond n >= 16000, the comparison count of
bottom-up heapsort is lower than that of median-of-three quicksort.
However, my concern is that quicksort typically exhibits better
locality of reference, making it more cache-friendly, while heapsort
lacks this advantage.


Best regards,
Kuan-Wei Chiu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort
  2023-12-04  8:20 ` Florian Weimer
  2023-12-04 18:31   ` Kuan-Wei Chiu
@ 2023-12-05  3:22   ` Kuan-Wei Chiu
  2023-12-05  3:22     ` [PATCH v2 1/2] " Kuan-Wei Chiu
  2023-12-05  3:22     ` [PATCH v2 2/2] stdlib: Adjust the factor in tst-qsort5 Kuan-Wei Chiu
  1 sibling, 2 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-05  3:22 UTC (permalink / raw)
  To: fweimer; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n, Kuan-Wei Chiu

The current heapsort implementation in the siftdown function follows
the standard textbook version, necessitating two comparisons at each
level. Transitioning to the Bottom-up heapsort version allows us to
decrease the required comparisons to just one per level. On average,
this modification significantly reduces the comparison count from
2nlog2(n) - 3n + O(n) to nlog2(n) + 0.37*n + O(n).

Refs:
  BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
  QUICKSORT (if n is not very small)
  Ingo Wegener
  Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
  https://doi.org/10.1016/0304-3975(93)90364-Y

Changes from v1:
* Use a more concise title.
* Correct the error in asymptotic notation from little-o to big-O.
* Adjusted the factor in tst-qsort5 from 4.5 to 3.5.

Kuan-Wei Chiu (2):
  stdlib: Optimize number of calls to comparison function in qsort
  stdlib: Adjust the factor in tst-qsort5

 stdlib/qsort.c      | 36 ++++++++++++++++++------------------
 stdlib/tst-qsort5.c |  2 +-
 2 files changed, 19 insertions(+), 19 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/2] stdlib: Optimize number of calls to comparison function in qsort
  2023-12-05  3:22   ` [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort Kuan-Wei Chiu
@ 2023-12-05  3:22     ` Kuan-Wei Chiu
  2023-12-05  3:22     ` [PATCH v2 2/2] stdlib: Adjust the factor in tst-qsort5 Kuan-Wei Chiu
  1 sibling, 0 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-05  3:22 UTC (permalink / raw)
  To: fweimer; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n, Kuan-Wei Chiu

The current heapsort implementation in the siftdown function follows
the standard textbook version, necessitating two comparisons at each
level. Transitioning to the Bottom-up heapsort version allows us to
decrease the required comparisons to just one per level. On average,
this modification significantly reduces the comparison count from
2nlog2(n) - 3n + O(n) to nlog2(n) + 0.37*n + O(n).

Refs:
  BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
  QUICKSORT (if n is not very small)
  Ingo Wegener
  Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
  https://doi.org/10.1016/0304-3975(93)90364-Y

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Changes from v1:
* Use a more concise title.
* Correct the error in asymptotic notation from little-o to big-O.

 stdlib/qsort.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/stdlib/qsort.c b/stdlib/qsort.c
index 62477010b6..0d924f5e2d 100644
--- a/stdlib/qsort.c
+++ b/stdlib/qsort.c
@@ -132,26 +132,26 @@ static inline void
 siftdown (void *base, size_t size, size_t k, size_t n,
 	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
 {
-  /* There can only be a heap condition violation if there are
-     children.  */
-  while (2 * k + 1 <= n)
-    {
-      /* Left child.  */
-      size_t j = 2 * k + 1;
-      /* If the right child is larger, use it.  */
-      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
-	j++;
-
-      /* If k is already >= to its children, we are done.  */
-      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
-	break;
+  size_t i, j;
+
+  /* Find the sift-down path all the way to the leaves. */
+  for (i = k; j = 2 * i + 1, j + 1 <= n;)
+    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
 
-      /* Heal the violation.  */
-      do_swap (base + (size * j), base + (k * size), size, swap_type);
+  /* Special case for the last leaf with no sibling. */
+  if (j == n)
+    i = j;
 
-      /* Swapping with j may have introduced a violation at j.  Fix
-	 it in the next loop iteration.  */
-      k = j;
+  /* Backtrack to the correct location. */
+  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
+    i = (i - 1) / 2;
+
+  /* Shift the element into its correct place. */
+  j = i;
+  while (i != k)
+    {
+      i = (i - 1) / 2;
+      do_swap (base + i * size, base + j * size, size, swap_type);
     }
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] stdlib: Adjust the factor in tst-qsort5
  2023-12-05  3:22   ` [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort Kuan-Wei Chiu
  2023-12-05  3:22     ` [PATCH v2 1/2] " Kuan-Wei Chiu
@ 2023-12-05  3:22     ` Kuan-Wei Chiu
  1 sibling, 0 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-05  3:22 UTC (permalink / raw)
  To: fweimer; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n, Kuan-Wei Chiu

Adjust the factor in the tst-qsort5 from 4.5 to 3.5 to better align
with the current implementation characteristics. This refinement is
prompted by the adoption of the bottom-up heapsort optimization, as
it significantly reduces the required comparisons.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Changes from v1:
* Adjusted the factor in tst-qsort5 from 4.5 to 3.5.

 stdlib/tst-qsort5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stdlib/tst-qsort5.c b/stdlib/tst-qsort5.c
index d3a88c30f8..63590967a6 100644
--- a/stdlib/tst-qsort5.c
+++ b/stdlib/tst-qsort5.c
@@ -155,7 +155,7 @@ check_one_n (size_t n)
           n, count, factor);
   /* This is an arbitrary factor which is true for the current
      implementation across a wide range of sizes.  */
-  TEST_VERIFY (factor <= 4.5);
+  TEST_VERIFY (factor <= 3.5);
 }
 
 static int
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-04 18:31   ` Kuan-Wei Chiu
@ 2023-12-05 10:44     ` Florian Weimer
  2023-12-05 20:00       ` Kuan-Wei Chiu
  2023-12-05 20:35     ` Zack Weinberg
  1 sibling, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2023-12-05 10:44 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n

* Kuan-Wei Chiu:

> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
>> 
>>   /* This is an arbitrary factor which is true for the current
>>      implementation across a wide range of sizes.  */
>>   TEST_VERIFY (factor <= 4.5);
>>
>
> It seems that the factor can be adjusted to around 3.5. I can send
> another patch for this adjustment or resend it as a patch series.

Based on the paper, I would have expected a greater reduction.

>> On the other hand, the heapsort code is not really expected to run at
>> all in the current implementation because median-of-three quicksort
>> usually avoids degenerating behavior.  If this heapsort variant uses a
>> number of comparisons that is competitive to quicksort, maybe we should
>> use it instead?  And use insertion sort for short arrays only.
>> 
>
> This patch currently reduces the comparison count only in scenarios
> where deep recursion results in a fallback to heapsort. According to
> the referenced paper, beyond n >= 16000, the comparison count of
> bottom-up heapsort is lower than that of median-of-three quicksort.
> However, my concern is that quicksort typically exhibits better
> locality of reference, making it more cache-friendly, while heapsort
> lacks this advantage.

Yes, the performance numbers I posted suggest that something like that
is at play here.  Even though heapsort with your patch performs fewer
comparisons to most other implementations we have floating around
(except the historic merge sort), it is still slower than most.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-05 10:44     ` Florian Weimer
@ 2023-12-05 20:00       ` Kuan-Wei Chiu
  0 siblings, 0 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-05 20:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, adhemerval.zanella, goldstein.w.n

On Tue, Dec 05, 2023 at 11:44:03AM +0100, Florian Weimer wrote:
> * Kuan-Wei Chiu:
> 
> > On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
> >> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
> >> 
> >>   /* This is an arbitrary factor which is true for the current
> >>      implementation across a wide range of sizes.  */
> >>   TEST_VERIFY (factor <= 4.5);
> >>
> >
> > It seems that the factor can be adjusted to around 3.5. I can send
> > another patch for this adjustment or resend it as a patch series.
> 
> Based on the paper, I would have expected a greater reduction.
> 

The adjustment factor of 3.5 is derived from data obtained using the
tst-qsort5 test. Upon examining the number of elements sorted by
heapsort during the test and the corresponding comparison counts, the
comparison count is approximately nlog2(n) + 0.83*n. This is higher
than the analysis in the paper, which suggests nlog2(n) + d(n)*n, where
d(n) belongs to the range [0.34, 0.39]. I believe this discrepancy is
due to the paper's focus on the average comparison count in random
scenarios, whereas here, we are intentionally creating adversarial
permutations, making it not purely random.

Additionally, in the performance numbers you posted, d(n) is
approximately equal to 0.34, aligning with the analysis in the paper.

Here is the test data:
heapsort: number of comparisons, number of elements

Before applying the patch (commit 8e755f5bc8):

info: length 100: 1832 comparisons ~ 2.438387 * n * log (n)         heapsort: 662, 76
info: length 1000: 34020 comparisons ~ 3.260275 * n * log (n)       heapsort: 15324, 964
info: length 10000: 494661 comparisons ~ 3.772689 * n * log (n)     heapsort: 225309, 9948
info: length 20000: 1071780 comparisons ~ 3.859536 * n * log (n)    heapsort: 492534, 19944
info: length 30000: 1644872 comparisons ~ 3.830672 * n * log (n)    heapsort: 775626, 29944
info: length 40000: 2305025 comparisons ~ 3.933328 * n * log (n)    heapsort: 1065893, 39940
info: length 50000: 2912911 comparisons ~ 3.913976 * n * log (n)    heapsort: 1363779, 49940
info: length 60000: 3529897 comparisons ~ 3.902134 * n * log (n)    heapsort: 1670765, 59940
info: length 70000: 4285743 comparisons ~ 4.009278 * n * log (n)    heapsort: 1976733, 69936
info: length 80000: 4929859 comparisons ~ 3.998699 * n * log (n)    heapsort: 2290849, 79936
info: length 90000: 5575320 comparisons ~ 3.987933 * n * log (n)    heapsort: 2606310, 89936
info: length 100000: 6223572 comparisons ~ 3.978286 * n * log (n)   heapsort: 2924562, 99936
info: length 110000: 6881296 comparisons ~ 3.973322 * n * log (n)   heapsort: 3252286, 109936
info: length 120000: 7537050 comparisons ~ 3.966365 * n * log (n)   heapsort: 3578040, 119936
info: length 130000: 8191258 comparisons ~ 3.958248 * n * log (n)   heapsort: 3902248, 129936
info: length 140000: 9138135 comparisons ~ 4.072406 * n * log (n)   heapsort: 4239255, 139932
info: length 150000: 9822309 comparisons ~ 4.067167 * n * log (n)   heapsort: 4573429, 149932

After applying the patch:

info: length 100: 1690 comparisons ~ 2.273802 * n * log (n)         heapsort: 520, 76
info: length 1000: 28993 comparisons ~ 2.821755 * n * log (n)       heapsort: 10297, 964
info: length 10000: 409589 comparisons ~ 3.169480 * n * log (n)     heapsort: 140237, 9948
info: length 20000: 879212 comparisons ~ 3.211902 * n * log (n)     heapsort: 299966, 19944
info: length 30000: 1336536 comparisons ~ 3.158407 * n * log (n)    heapsort: 467290, 29944
info: length 40000: 1882150 comparisons ~ 3.256782 * n * log (n)    heapsort: 643018, 39940
info: length 50000: 2369603 comparisons ~ 3.228731 * n * log (n)    heapsort: 820471, 49940
info: length 60000: 2859511 comparisons ~ 3.205839 * n * log (n)    heapsort: 1000379, 59940
info: length 70000: 3492146 comparisons ~ 3.311278 * n * log (n)    heapsort: 1183136, 69936
info: length 80000: 4006690 comparisons ~ 3.294222 * n * log (n)    heapsort: 1367680, 79936
info: length 90000: 4523504 comparisons ~ 3.279727 * n * log (n)    heapsort: 1554494, 89936
info: length 100000: 5041697 comparisons ~ 3.266775 * n * log (n)   heapsort: 1742687, 99936
info: length 110000: 5562597 comparisons ~ 3.255888 * n * log (n)   heapsort: 1933587, 109936
info: length 120000: 6083545 comparisons ~ 3.245368 * n * log (n)   heapsort: 2124535, 119936
info: length 130000: 6604920 comparisons ~ 3.235434 * n * log (n)   heapsort: 2315910, 129936
info: length 140000: 7407274 comparisons ~ 3.344872 * n * log (n)   heapsort: 2508394, 139932
info: length 150000: 7951873 comparisons ~ 3.336444 * n * log (n)   heapsort: 2702993, 149932


Best regards,
Kuan-Wei Chiu
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-04 18:31   ` Kuan-Wei Chiu
  2023-12-05 10:44     ` Florian Weimer
@ 2023-12-05 20:35     ` Zack Weinberg
  2023-12-06 10:21       ` Florian Weimer
  1 sibling, 1 reply; 22+ messages in thread
From: Zack Weinberg @ 2023-12-05 20:35 UTC (permalink / raw)
  To: Kuan-Wei Chiu, Florian Weimer
  Cc: GNU libc development, Adhemerval Zanella, goldstein.w.n

On Mon, Dec 4, 2023, at 1:31 PM, Kuan-Wei Chiu wrote:
> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
>> 
>>   /* This is an arbitrary factor which is true for the current
>>      implementation across a wide range of sizes.  */
>>   TEST_VERIFY (factor <= 4.5);
>
> It seems that the factor can be adjusted to around 3.5. I can send
> another patch for this adjustment or resend it as a patch series.

Before you go any further with this patch series I think we need to
decide whether our backward compatibility constraints mean our qsort
has to be a stable sort.  If so, we should make it *always* be a
stable sort and write that into the documentation, and that would
mean junking the entire heapsort implementation.

I do not have a strong opinion either way, but I do think it should
either *always* or *never* be stable, i.e. not the previous state of
"whether you get a stable sort depends on the size of the input and
various other undocumented factors."  Also, I think that the
behavior should not depend on which version of glibc a program
was compiled against.

zw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-05 20:35     ` Zack Weinberg
@ 2023-12-06 10:21       ` Florian Weimer
  2023-12-06 12:51         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2023-12-06 10:21 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Kuan-Wei Chiu, GNU libc development, Adhemerval Zanella, goldstein.w.n

* Zack Weinberg:

> On Mon, Dec 4, 2023, at 1:31 PM, Kuan-Wei Chiu wrote:
>> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>>> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
>>> 
>>>   /* This is an arbitrary factor which is true for the current
>>>      implementation across a wide range of sizes.  */
>>>   TEST_VERIFY (factor <= 4.5);
>>
>> It seems that the factor can be adjusted to around 3.5. I can send
>> another patch for this adjustment or resend it as a patch series.
>
> Before you go any further with this patch series I think we need to
> decide whether our backward compatibility constraints mean our qsort
> has to be a stable sort.  If so, we should make it *always* be a
> stable sort and write that into the documentation, and that would
> mean junking the entire heapsort implementation.

That makes sense, although there might not exist an in-place sorting
algorithm that takes constant extra space regardless of element count
and element size and has reasonable performance.  Maybe we could say,
“stable if element size is less than 1024 bytes” or something like that.

What I'm concerned about is that with the current implementation, we
take a performance hit *and* have compatibility problems.  The
compatibility problems would be easier to justify if we actually made
things faster.  Not calling malloc internally is unlikely to be
compelling to programmers.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-06 10:21       ` Florian Weimer
@ 2023-12-06 12:51         ` Adhemerval Zanella Netto
  2023-12-17 15:42           ` Zack Weinberg
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-06 12:51 UTC (permalink / raw)
  To: Florian Weimer, Zack Weinberg
  Cc: Kuan-Wei Chiu, GNU libc development, goldstein.w.n



On 06/12/23 07:21, Florian Weimer wrote:
> * Zack Weinberg:
> 
>> On Mon, Dec 4, 2023, at 1:31 PM, Kuan-Wei Chiu wrote:
>>> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>>>> I think the factor in stdlib/tst-qsort5.c needs to be adjusted:
>>>>
>>>>   /* This is an arbitrary factor which is true for the current
>>>>      implementation across a wide range of sizes.  */
>>>>   TEST_VERIFY (factor <= 4.5);
>>>
>>> It seems that the factor can be adjusted to around 3.5. I can send
>>> another patch for this adjustment or resend it as a patch series.
>>
>> Before you go any further with this patch series I think we need to
>> decide whether our backward compatibility constraints mean our qsort
>> has to be a stable sort.  If so, we should make it *always* be a
>> stable sort and write that into the documentation, and that would
>> mean junking the entire heapsort implementation.
> 
> That makes sense, although there might not exist an in-place sorting
> algorithm that takes constant extra space regardless of element count
> and element size and has reasonable performance.  Maybe we could say,
> “stable if element size is less than 1024 bytes” or something like that.
> 
> What I'm concerned about is that with the current implementation, we
> take a performance hit *and* have compatibility problems.  The
> compatibility problems would be easier to justify if we actually made
> things faster.  Not calling malloc internally is unlikely to be
> compelling to programmers.

My initial intention was to remove internal multiple sorting strategies
that has different semantics and that might eventually trigger corner
cases issues (the stable default mergesort versus the fallback quicksort),
and fix the quicksort fallback that had the long standing issue of O(n^2) 
on adversarial inputs.

However it seems that glibc qsort now become an instance on Hyrum's law,
where it does not really matter that neither POSIX nor C standard promised 
sorting stability.

So I presume it would be better to keep with mergesort for all input sizes,
along with the limited stack buffer (so there is no failure point up a 
certain input number/size); and fallback to heapsort on memory allocation
failure.  I will prepare a patch to restore it.

Other system does provide mergesort as a different symbol [1], which also
returns a failure is case memory can not be allocated.  But I don't think
it would be an improvement in this situation (no one will really adapt
their code to use mergesort if a stable allocation is required).  I also
though about adding a tunable to enable mergesort on qsort, but it would
ending up only add more complexity and required more testing.

[1] https://man.freebsd.org/cgi/man.cgi?query=mergesort&sektion=3

 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-06 12:51         ` Adhemerval Zanella Netto
@ 2023-12-17 15:42           ` Zack Weinberg
  2023-12-17 15:55             ` Florian Weimer
  2023-12-17 16:47             ` Kuan-Wei Chiu
  0 siblings, 2 replies; 22+ messages in thread
From: Zack Weinberg @ 2023-12-17 15:42 UTC (permalink / raw)
  To: Adhemerval Zanella, Florian Weimer
  Cc: Kuan-Wei Chiu, GNU libc development, Noah Goldstein

On Wed, Dec 6, 2023, at 7:51 AM, Adhemerval Zanella Netto wrote:
> On 06/12/23 07:21, Florian Weimer wrote:
>> * Zack Weinberg:
>>>
>>> Before you go any further with this patch series I think we need to
>>> decide whether our backward compatibility constraints mean our qsort
>>> has to be a stable sort.  If so, we should make it *always* be a
>>> stable sort and write that into the documentation, and that would
>>> mean junking the entire heapsort implementation.
>>
>> That makes sense, although there might not exist an in-place sorting
>> algorithm that takes constant extra space regardless of element
>> count and element size and has reasonable performance.  Maybe we
>> could say, “stable if element size is less than 1024 bytes” or
>> something like that.

It occurred to me late last night that *any* in-place sorting algorithm
that operates on an array, can be made stable with no additional storage
cost, by using each element's address as a tiebreaker for the comparison
function.  It *doesn't* need to be the element's original address -- all
that matters is that for every pair of elements that compare equal, the
sorting algorithm preserves the < relation for their addresses across
all swaps, which should happen naturally if address-< is used as a
tiebreaker for comparisons.

(That said, I still think we should give that "block exchange mergesort"
algorithm I dug up the other week a try.  I might have time early in
January to put a patch set together.)

zw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-17 15:42           ` Zack Weinberg
@ 2023-12-17 15:55             ` Florian Weimer
  2023-12-17 16:47             ` Kuan-Wei Chiu
  1 sibling, 0 replies; 22+ messages in thread
From: Florian Weimer @ 2023-12-17 15:55 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Adhemerval Zanella, Kuan-Wei Chiu, GNU libc development, Noah Goldstein

* Zack Weinberg:

> It occurred to me late last night that *any* in-place sorting algorithm
> that operates on an array, can be made stable with no additional storage
> cost, by using each element's address as a tiebreaker for the comparison
> function.  It *doesn't* need to be the element's original address -- all
> that matters is that for every pair of elements that compare equal, the
> sorting algorithm preserves the < relation for their addresses across
> all swaps, which should happen naturally if address-< is used as a
> tiebreaker for comparisons.

It's not as simple as that because we might have A < B, so we can move A
to a lower address than B, but at the same time time, we must make sure
that we do not move it past an element C with A == C, otherwise the
result of the tie-breaking address comparison changes.  I haven't really
thought about it, but I suspect it's possible to do for some of the
quadratic algorithms, but quicksort and heapsort—not so much.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-17 15:42           ` Zack Weinberg
  2023-12-17 15:55             ` Florian Weimer
@ 2023-12-17 16:47             ` Kuan-Wei Chiu
  2023-12-17 18:02               ` Florian Weimer
  1 sibling, 1 reply; 22+ messages in thread
From: Kuan-Wei Chiu @ 2023-12-17 16:47 UTC (permalink / raw)
  To: Zack Weinberg, Florian Weimer, Adhemerval Zanella
  Cc: GNU libc development, Noah Goldstein

On Sun, Dec 17, 2023 at 10:42:17AM -0500, Zack Weinberg wrote:
> It occurred to me late last night that *any* in-place sorting algorithm
> that operates on an array, can be made stable with no additional storage
> cost, by using each element's address as a tiebreaker for the comparison
> function.  It *doesn't* need to be the element's original address -- all
> that matters is that for every pair of elements that compare equal, the
> sorting algorithm preserves the < relation for their addresses across
> all swaps, which should happen naturally if address-< is used as a
> tiebreaker for comparisons.
> 
> (That said, I still think we should give that "block exchange mergesort"
> algorithm I dug up the other week a try.  I might have time early in
> January to put a patch set together.)

Additionally, if we're open to exploring non-in-place options, I think
Timsort might be worth considering. It's widely adopted, including in
Python and Android, serving as a stable hybrid sorting algorithm.
Timsort is an adaptive sort algorithm that operates in O(n) time when
the input is already sorted and maintains a worst-case time complexity
of O(n log n), similar to mergesort. I'd be happy to prepare a patch
for Timsort next month if needed.

Link: https://github.com/python/cpython/blob/main/Objects/listsort.txt

Best regards,
Kuan-Wei Chiu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-17 16:47             ` Kuan-Wei Chiu
@ 2023-12-17 18:02               ` Florian Weimer
  0 siblings, 0 replies; 22+ messages in thread
From: Florian Weimer @ 2023-12-17 18:02 UTC (permalink / raw)
  To: Kuan-Wei Chiu
  Cc: Zack Weinberg, Adhemerval Zanella, GNU libc development, Noah Goldstein

* Kuan-Wei Chiu:

> On Sun, Dec 17, 2023 at 10:42:17AM -0500, Zack Weinberg wrote:
>> It occurred to me late last night that *any* in-place sorting algorithm
>> that operates on an array, can be made stable with no additional storage
>> cost, by using each element's address as a tiebreaker for the comparison
>> function.  It *doesn't* need to be the element's original address -- all
>> that matters is that for every pair of elements that compare equal, the
>> sorting algorithm preserves the < relation for their addresses across
>> all swaps, which should happen naturally if address-< is used as a
>> tiebreaker for comparisons.
>> 
>> (That said, I still think we should give that "block exchange mergesort"
>> algorithm I dug up the other week a try.  I might have time early in
>> January to put a patch set together.)
>
> Additionally, if we're open to exploring non-in-place options, I think
> Timsort might be worth considering. It's widely adopted, including in
> Python and Android, serving as a stable hybrid sorting algorithm.
> Timsort is an adaptive sort algorithm that operates in O(n) time when
> the input is already sorted and maintains a worst-case time complexity
> of O(n log n), similar to mergesort. I'd be happy to prepare a patch
> for Timsort next month if needed.

I think we really want to get rid of that malloc in qsort.  This means
that we need something in-place-ish: we can throw a fair amount of stack
space at the problem (probably quite a bit more of the circa 1024 bytes
we use today), but not something that scales linearly with the size of
the input array.  We cannot not even even assume space for just one
temporary object on the stack, I think.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2023-12-02 21:48 [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
  2023-12-04  8:20 ` Florian Weimer
@ 2024-02-16  7:08 ` Kuan-Wei Chiu
  2024-03-27 19:45   ` Adhemerval Zanella Netto
  1 sibling, 1 reply; 22+ messages in thread
From: Kuan-Wei Chiu @ 2024-02-16  7:08 UTC (permalink / raw)
  To: libc-alpha; +Cc: adhemerval.zanella, goldstein.w.n, fweimer, zack

On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
> The current heapsort implementation in the siftdown function follows
> the standard textbook version, necessitating two comparisons at each
> level. Transitioning to the Bottom-up heapsort version allows us to
> decrease the required comparisons to just one per level. On average,
> this modification significantly reduces the comparison count from
> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
> 
> Refs:
>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
>   QUICKSORT (if n is not very small)
>   Ingo Wegener
>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
>   https://doi.org/10.1016/0304-3975(93)90364-Y
> 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> index be01fb5598..f5babef150 100644
> --- a/stdlib/qsort.c
> +++ b/stdlib/qsort.c
> @@ -132,26 +132,26 @@ static inline void
>  siftdown (void *base, size_t size, size_t k, size_t n,
>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>  {
> -  /* There can only be a heap condition violation if there are
> -     children.  */
> -  while (2 * k + 1 <= n)
> -    {
> -      /* Left child.  */
> -      size_t j = 2 * k + 1;
> -      /* If the right child is larger, use it.  */
> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
> -	j++;
> -
> -      /* If k is already >= to its children, we are done.  */
> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
> -	break;
> +  size_t i, j;
> +
> +  /* Find the sift-down path all the way to the leaves. */
> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>  
> -      /* Heal the violation.  */
> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
> +  /* Special case for the last leaf with no sibling. */
> +  if (j == n)
> +    i = j;
>  
> -      /* Swapping with j may have introduced a violation at j.  Fix
> -	 it in the next loop iteration.  */
> -      k = j;
> +  /* Backtrack to the correct location. */
> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> +    i = (i - 1) / 2;
> +
> +  /* Shift the element into its correct place. */
> +  j = i;
> +  while (i != k)
> +    {
> +      i = (i - 1) / 2;
> +      do_swap (base + i * size, base + j * size, size, swap_type);
>      }
>  }
>  
> -- 
> 2.25.1
>

Since we still retain heapsort as a fallback for mergesort, should we
reconsider applying this patch?

Thanks,
Kuan-Wei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-02-16  7:08 ` [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
@ 2024-03-27 19:45   ` Adhemerval Zanella Netto
  2024-03-27 19:59     ` Kuan-Wei Chiu
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella Netto @ 2024-03-27 19:45 UTC (permalink / raw)
  To: Kuan-Wei Chiu, libc-alpha; +Cc: goldstein.w.n, fweimer, zack



On 16/02/24 04:08, Kuan-Wei Chiu wrote:
> On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
>> The current heapsort implementation in the siftdown function follows
>> the standard textbook version, necessitating two comparisons at each
>> level. Transitioning to the Bottom-up heapsort version allows us to
>> decrease the required comparisons to just one per level. On average,
>> this modification significantly reduces the comparison count from
>> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
>>
>> Refs:
>>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
>>   QUICKSORT (if n is not very small)
>>   Ingo Wegener
>>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
>>   https://doi.org/10.1016/0304-3975(93)90364-Y
>>
>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>> ---
>>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
>>  1 file changed, 18 insertions(+), 18 deletions(-)
>>
>> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
>> index be01fb5598..f5babef150 100644
>> --- a/stdlib/qsort.c
>> +++ b/stdlib/qsort.c
>> @@ -132,26 +132,26 @@ static inline void
>>  siftdown (void *base, size_t size, size_t k, size_t n,
>>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>>  {
>> -  /* There can only be a heap condition violation if there are
>> -     children.  */
>> -  while (2 * k + 1 <= n)
>> -    {
>> -      /* Left child.  */
>> -      size_t j = 2 * k + 1;
>> -      /* If the right child is larger, use it.  */
>> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>> -	j++;
>> -
>> -      /* If k is already >= to its children, we are done.  */
>> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
>> -	break;
>> +  size_t i, j;
>> +
>> +  /* Find the sift-down path all the way to the leaves. */
>> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
>> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>>  
>> -      /* Heal the violation.  */
>> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
>> +  /* Special case for the last leaf with no sibling. */
>> +  if (j == n)
>> +    i = j;
>>  
>> -      /* Swapping with j may have introduced a violation at j.  Fix
>> -	 it in the next loop iteration.  */
>> -      k = j;
>> +  /* Backtrack to the correct location. */
>> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
>> +    i = (i - 1) / 2;
>> +
>> +  /* Shift the element into its correct place. */
>> +  j = i;
>> +  while (i != k)
>> +    {
>> +      i = (i - 1) / 2;
>> +      do_swap (base + i * size, base + j * size, size, swap_type);
>>      }
>>  }
>>  
>> -- 
>> 2.25.1
>>
> 
> Since we still retain heapsort as a fallback for mergesort, should we
> reconsider applying this patch?
> 
> Thanks,
> Kuan-Wei

Yes, I think it is work and it looks great to me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-03-27 19:45   ` Adhemerval Zanella Netto
@ 2024-03-27 19:59     ` Kuan-Wei Chiu
  2024-03-27 20:42       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 22+ messages in thread
From: Kuan-Wei Chiu @ 2024-03-27 19:59 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, goldstein.w.n, fweimer, zack, jserv

On Wed, Mar 27, 2024 at 04:45:35PM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 16/02/24 04:08, Kuan-Wei Chiu wrote:
> > On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
> >> The current heapsort implementation in the siftdown function follows
> >> the standard textbook version, necessitating two comparisons at each
> >> level. Transitioning to the Bottom-up heapsort version allows us to
> >> decrease the required comparisons to just one per level. On average,
> >> this modification significantly reduces the comparison count from
> >> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
> >>
> >> Refs:
> >>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
> >>   QUICKSORT (if n is not very small)
> >>   Ingo Wegener
> >>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
> >>   https://doi.org/10.1016/0304-3975(93)90364-Y
> >>
> >> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> >> ---
> >>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
> >>  1 file changed, 18 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> >> index be01fb5598..f5babef150 100644
> >> --- a/stdlib/qsort.c
> >> +++ b/stdlib/qsort.c
> >> @@ -132,26 +132,26 @@ static inline void
> >>  siftdown (void *base, size_t size, size_t k, size_t n,
> >>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
> >>  {
> >> -  /* There can only be a heap condition violation if there are
> >> -     children.  */
> >> -  while (2 * k + 1 <= n)
> >> -    {
> >> -      /* Left child.  */
> >> -      size_t j = 2 * k + 1;
> >> -      /* If the right child is larger, use it.  */
> >> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
> >> -	j++;
> >> -
> >> -      /* If k is already >= to its children, we are done.  */
> >> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
> >> -	break;
> >> +  size_t i, j;
> >> +
> >> +  /* Find the sift-down path all the way to the leaves. */
> >> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
> >> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
> >>  
> >> -      /* Heal the violation.  */
> >> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
> >> +  /* Special case for the last leaf with no sibling. */
> >> +  if (j == n)
> >> +    i = j;
> >>  
> >> -      /* Swapping with j may have introduced a violation at j.  Fix
> >> -	 it in the next loop iteration.  */
> >> -      k = j;
> >> +  /* Backtrack to the correct location. */
> >> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> >> +    i = (i - 1) / 2;
> >> +
> >> +  /* Shift the element into its correct place. */
> >> +  j = i;
> >> +  while (i != k)
> >> +    {
> >> +      i = (i - 1) / 2;
> >> +      do_swap (base + i * size, base + j * size, size, swap_type);
> >>      }
> >>  }
> >>  
> >> -- 
> >> 2.25.1
> >>
> > 
> > Since we still retain heapsort as a fallback for mergesort, should we
> > reconsider applying this patch?
> > 
> > Thanks,
> > Kuan-Wei
> 
> Yes, I think it is work and it looks great to me.

Should I resend a new patch, or can you directly apply this one?

Thanks,
Kuan-Wei

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-03-27 19:59     ` Kuan-Wei Chiu
@ 2024-03-27 20:42       ` Adhemerval Zanella Netto
  2024-09-01  2:09         ` Kuan-Wei Chiu
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella Netto @ 2024-03-27 20:42 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: libc-alpha, goldstein.w.n, fweimer, zack, jserv



On 27/03/24 16:59, Kuan-Wei Chiu wrote:
> On Wed, Mar 27, 2024 at 04:45:35PM -0300, Adhemerval Zanella Netto wrote:
>>
>>
>> On 16/02/24 04:08, Kuan-Wei Chiu wrote:
>>> On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
>>>> The current heapsort implementation in the siftdown function follows
>>>> the standard textbook version, necessitating two comparisons at each
>>>> level. Transitioning to the Bottom-up heapsort version allows us to
>>>> decrease the required comparisons to just one per level. On average,
>>>> this modification significantly reduces the comparison count from
>>>> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
>>>>
>>>> Refs:
>>>>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
>>>>   QUICKSORT (if n is not very small)
>>>>   Ingo Wegener
>>>>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
>>>>   https://doi.org/10.1016/0304-3975(93)90364-Y
>>>>
>>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>>>> ---
>>>>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
>>>>  1 file changed, 18 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
>>>> index be01fb5598..f5babef150 100644
>>>> --- a/stdlib/qsort.c
>>>> +++ b/stdlib/qsort.c
>>>> @@ -132,26 +132,26 @@ static inline void
>>>>  siftdown (void *base, size_t size, size_t k, size_t n,
>>>>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>>>>  {
>>>> -  /* There can only be a heap condition violation if there are
>>>> -     children.  */
>>>> -  while (2 * k + 1 <= n)
>>>> -    {
>>>> -      /* Left child.  */
>>>> -      size_t j = 2 * k + 1;
>>>> -      /* If the right child is larger, use it.  */
>>>> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>>>> -	j++;
>>>> -
>>>> -      /* If k is already >= to its children, we are done.  */
>>>> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
>>>> -	break;
>>>> +  size_t i, j;
>>>> +
>>>> +  /* Find the sift-down path all the way to the leaves. */
>>>> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
>>>> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>>>>  
>>>> -      /* Heal the violation.  */
>>>> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
>>>> +  /* Special case for the last leaf with no sibling. */
>>>> +  if (j == n)
>>>> +    i = j;
>>>>  
>>>> -      /* Swapping with j may have introduced a violation at j.  Fix
>>>> -	 it in the next loop iteration.  */
>>>> -      k = j;
>>>> +  /* Backtrack to the correct location. */
>>>> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
>>>> +    i = (i - 1) / 2;
>>>> +
>>>> +  /* Shift the element into its correct place. */
>>>> +  j = i;
>>>> +  while (i != k)
>>>> +    {
>>>> +      i = (i - 1) / 2;
>>>> +      do_swap (base + i * size, base + j * size, size, swap_type);
>>>>      }
>>>>  }
>>>>  
>>>> -- 
>>>> 2.25.1
>>>>
>>>
>>> Since we still retain heapsort as a fallback for mergesort, should we
>>> reconsider applying this patch?
>>>
>>> Thanks,
>>> Kuan-Wei
>>
>> Yes, I think it is work and it looks great to me.
> 
> Should I resend a new patch, or can you directly apply this one?

It applies as-is, so it should be ok. I will run some tests and apply, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-03-27 20:42       ` Adhemerval Zanella Netto
@ 2024-09-01  2:09         ` Kuan-Wei Chiu
  2024-09-02 13:45           ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 22+ messages in thread
From: Kuan-Wei Chiu @ 2024-09-01  2:09 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, goldstein.w.n, fweimer, zack, jserv

On Wed, Mar 27, 2024 at 05:42:21PM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 27/03/24 16:59, Kuan-Wei Chiu wrote:
> > On Wed, Mar 27, 2024 at 04:45:35PM -0300, Adhemerval Zanella Netto wrote:
> >>
> >>
> >> On 16/02/24 04:08, Kuan-Wei Chiu wrote:
> >>> On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
> >>>> The current heapsort implementation in the siftdown function follows
> >>>> the standard textbook version, necessitating two comparisons at each
> >>>> level. Transitioning to the Bottom-up heapsort version allows us to
> >>>> decrease the required comparisons to just one per level. On average,
> >>>> this modification significantly reduces the comparison count from
> >>>> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
> >>>>
> >>>> Refs:
> >>>>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
> >>>>   QUICKSORT (if n is not very small)
> >>>>   Ingo Wegener
> >>>>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
> >>>>   https://doi.org/10.1016/0304-3975(93)90364-Y
> >>>>
> >>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> >>>> ---
> >>>>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
> >>>>  1 file changed, 18 insertions(+), 18 deletions(-)
> >>>>
> >>>> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> >>>> index be01fb5598..f5babef150 100644
> >>>> --- a/stdlib/qsort.c
> >>>> +++ b/stdlib/qsort.c
> >>>> @@ -132,26 +132,26 @@ static inline void
> >>>>  siftdown (void *base, size_t size, size_t k, size_t n,
> >>>>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
> >>>>  {
> >>>> -  /* There can only be a heap condition violation if there are
> >>>> -     children.  */
> >>>> -  while (2 * k + 1 <= n)
> >>>> -    {
> >>>> -      /* Left child.  */
> >>>> -      size_t j = 2 * k + 1;
> >>>> -      /* If the right child is larger, use it.  */
> >>>> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
> >>>> -	j++;
> >>>> -
> >>>> -      /* If k is already >= to its children, we are done.  */
> >>>> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
> >>>> -	break;
> >>>> +  size_t i, j;
> >>>> +
> >>>> +  /* Find the sift-down path all the way to the leaves. */
> >>>> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
> >>>> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
> >>>>  
> >>>> -      /* Heal the violation.  */
> >>>> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
> >>>> +  /* Special case for the last leaf with no sibling. */
> >>>> +  if (j == n)
> >>>> +    i = j;
> >>>>  
> >>>> -      /* Swapping with j may have introduced a violation at j.  Fix
> >>>> -	 it in the next loop iteration.  */
> >>>> -      k = j;
> >>>> +  /* Backtrack to the correct location. */
> >>>> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> >>>> +    i = (i - 1) / 2;
> >>>> +
> >>>> +  /* Shift the element into its correct place. */
> >>>> +  j = i;
> >>>> +  while (i != k)
> >>>> +    {
> >>>> +      i = (i - 1) / 2;
> >>>> +      do_swap (base + i * size, base + j * size, size, swap_type);
> >>>>      }
> >>>>  }
> >>>>  
> >>>> -- 
> >>>> 2.25.1
> >>>>
> >>>
> >>> Since we still retain heapsort as a fallback for mergesort, should we
> >>> reconsider applying this patch?
> >>>
> >>> Thanks,
> >>> Kuan-Wei
> >>
> >> Yes, I think it is work and it looks great to me.
> > 
> > Should I resend a new patch, or can you directly apply this one?
> 
> It applies as-is, so it should be ok. I will run some tests and apply, thanks.

Ping?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-09-01  2:09         ` Kuan-Wei Chiu
@ 2024-09-02 13:45           ` Adhemerval Zanella Netto
  2024-12-09  0:17             ` Kuan-Wei Chiu
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella Netto @ 2024-09-02 13:45 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: libc-alpha, goldstein.w.n, fweimer, zack, jserv



On 31/08/24 23:09, Kuan-Wei Chiu wrote:
> On Wed, Mar 27, 2024 at 05:42:21PM -0300, Adhemerval Zanella Netto wrote:
>>
>>
>> On 27/03/24 16:59, Kuan-Wei Chiu wrote:
>>> On Wed, Mar 27, 2024 at 04:45:35PM -0300, Adhemerval Zanella Netto wrote:
>>>>
>>>>
>>>> On 16/02/24 04:08, Kuan-Wei Chiu wrote:
>>>>> On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
>>>>>> The current heapsort implementation in the siftdown function follows
>>>>>> the standard textbook version, necessitating two comparisons at each
>>>>>> level. Transitioning to the Bottom-up heapsort version allows us to
>>>>>> decrease the required comparisons to just one per level. On average,
>>>>>> this modification significantly reduces the comparison count from
>>>>>> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
>>>>>>
>>>>>> Refs:
>>>>>>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
>>>>>>   QUICKSORT (if n is not very small)
>>>>>>   Ingo Wegener
>>>>>>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
>>>>>>   https://doi.org/10.1016/0304-3975(93)90364-Y
>>>>>>
>>>>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>>>>>> ---
>>>>>>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
>>>>>>  1 file changed, 18 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
>>>>>> index be01fb5598..f5babef150 100644
>>>>>> --- a/stdlib/qsort.c
>>>>>> +++ b/stdlib/qsort.c
>>>>>> @@ -132,26 +132,26 @@ static inline void
>>>>>>  siftdown (void *base, size_t size, size_t k, size_t n,
>>>>>>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>>>>>>  {
>>>>>> -  /* There can only be a heap condition violation if there are
>>>>>> -     children.  */
>>>>>> -  while (2 * k + 1 <= n)
>>>>>> -    {
>>>>>> -      /* Left child.  */
>>>>>> -      size_t j = 2 * k + 1;
>>>>>> -      /* If the right child is larger, use it.  */
>>>>>> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>>>>>> -	j++;
>>>>>> -
>>>>>> -      /* If k is already >= to its children, we are done.  */
>>>>>> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
>>>>>> -	break;
>>>>>> +  size_t i, j;
>>>>>> +
>>>>>> +  /* Find the sift-down path all the way to the leaves. */
>>>>>> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
>>>>>> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>>>>>>  
>>>>>> -      /* Heal the violation.  */
>>>>>> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
>>>>>> +  /* Special case for the last leaf with no sibling. */
>>>>>> +  if (j == n)
>>>>>> +    i = j;
>>>>>>  
>>>>>> -      /* Swapping with j may have introduced a violation at j.  Fix
>>>>>> -	 it in the next loop iteration.  */
>>>>>> -      k = j;
>>>>>> +  /* Backtrack to the correct location. */
>>>>>> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
>>>>>> +    i = (i - 1) / 2;
>>>>>> +
>>>>>> +  /* Shift the element into its correct place. */
>>>>>> +  j = i;
>>>>>> +  while (i != k)
>>>>>> +    {
>>>>>> +      i = (i - 1) / 2;
>>>>>> +      do_swap (base + i * size, base + j * size, size, swap_type);
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>
>>>>>
>>>>> Since we still retain heapsort as a fallback for mergesort, should we
>>>>> reconsider applying this patch?
>>>>>
>>>>> Thanks,
>>>>> Kuan-Wei
>>>>
>>>> Yes, I think it is work and it looks great to me.
>>>
>>> Should I resend a new patch, or can you directly apply this one?
>>
>> It applies as-is, so it should be ok. I will run some tests and apply, thanks.
> 
> Ping?

This was on my backlog, thanks to remind me.  I will commit this shortly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] stdlib: Optimize number of calls to comparison function
  2024-09-02 13:45           ` Adhemerval Zanella Netto
@ 2024-12-09  0:17             ` Kuan-Wei Chiu
  0 siblings, 0 replies; 22+ messages in thread
From: Kuan-Wei Chiu @ 2024-12-09  0:17 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha, goldstein.w.n, fweimer, zack, jserv

Hi Adhemerval,

On Mon, Sep 02, 2024 at 10:45:16AM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 31/08/24 23:09, Kuan-Wei Chiu wrote:
> > On Wed, Mar 27, 2024 at 05:42:21PM -0300, Adhemerval Zanella Netto wrote:
> >>
> >>
> >> On 27/03/24 16:59, Kuan-Wei Chiu wrote:
> >>> On Wed, Mar 27, 2024 at 04:45:35PM -0300, Adhemerval Zanella Netto wrote:
> >>>>
> >>>>
> >>>> On 16/02/24 04:08, Kuan-Wei Chiu wrote:
> >>>>> On Sun, Dec 03, 2023 at 05:48:39AM +0800, Kuan-Wei Chiu wrote:
> >>>>>> The current heapsort implementation in the siftdown function follows
> >>>>>> the standard textbook version, necessitating two comparisons at each
> >>>>>> level. Transitioning to the Bottom-up heapsort version allows us to
> >>>>>> decrease the required comparisons to just one per level. On average,
> >>>>>> this modification significantly reduces the comparison count from
> >>>>>> 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37*n + o(n).
> >>>>>>
> >>>>>> Refs:
> >>>>>>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
> >>>>>>   QUICKSORT (if n is not very small)
> >>>>>>   Ingo Wegener
> >>>>>>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
> >>>>>>   https://doi.org/10.1016/0304-3975(93)90364-Y
> >>>>>>
> >>>>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> >>>>>> ---
> >>>>>>  stdlib/qsort.c | 36 ++++++++++++++++++------------------
> >>>>>>  1 file changed, 18 insertions(+), 18 deletions(-)
> >>>>>>
> >>>>>> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> >>>>>> index be01fb5598..f5babef150 100644
> >>>>>> --- a/stdlib/qsort.c
> >>>>>> +++ b/stdlib/qsort.c
> >>>>>> @@ -132,26 +132,26 @@ static inline void
> >>>>>>  siftdown (void *base, size_t size, size_t k, size_t n,
> >>>>>>  	  enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
> >>>>>>  {
> >>>>>> -  /* There can only be a heap condition violation if there are
> >>>>>> -     children.  */
> >>>>>> -  while (2 * k + 1 <= n)
> >>>>>> -    {
> >>>>>> -      /* Left child.  */
> >>>>>> -      size_t j = 2 * k + 1;
> >>>>>> -      /* If the right child is larger, use it.  */
> >>>>>> -      if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
> >>>>>> -	j++;
> >>>>>> -
> >>>>>> -      /* If k is already >= to its children, we are done.  */
> >>>>>> -      if (j == k || cmp (base + (k * size), base + (j * size), arg) >= 0)
> >>>>>> -	break;
> >>>>>> +  size_t i, j;
> >>>>>> +
> >>>>>> +  /* Find the sift-down path all the way to the leaves. */
> >>>>>> +  for (i = k; j = 2 * i + 1, j + 1 <= n;)
> >>>>>> +    i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
> >>>>>>  
> >>>>>> -      /* Heal the violation.  */
> >>>>>> -      do_swap (base + (size * j), base + (k * size), size, swap_type);
> >>>>>> +  /* Special case for the last leaf with no sibling. */
> >>>>>> +  if (j == n)
> >>>>>> +    i = j;
> >>>>>>  
> >>>>>> -      /* Swapping with j may have introduced a violation at j.  Fix
> >>>>>> -	 it in the next loop iteration.  */
> >>>>>> -      k = j;
> >>>>>> +  /* Backtrack to the correct location. */
> >>>>>> +  while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> >>>>>> +    i = (i - 1) / 2;
> >>>>>> +
> >>>>>> +  /* Shift the element into its correct place. */
> >>>>>> +  j = i;
> >>>>>> +  while (i != k)
> >>>>>> +    {
> >>>>>> +      i = (i - 1) / 2;
> >>>>>> +      do_swap (base + i * size, base + j * size, size, swap_type);
> >>>>>>      }
> >>>>>>  }
> >>>>>>  
> >>>>>> -- 
> >>>>>> 2.25.1
> >>>>>>
> >>>>>
> >>>>> Since we still retain heapsort as a fallback for mergesort, should we
> >>>>> reconsider applying this patch?
> >>>>>
> >>>>> Thanks,
> >>>>> Kuan-Wei
> >>>>
> >>>> Yes, I think it is work and it looks great to me.
> >>>
> >>> Should I resend a new patch, or can you directly apply this one?
> >>
> >> It applies as-is, so it should be ok. I will run some tests and apply, thanks.
> > 
> > Ping?
> 
> This was on my backlog, thanks to remind me.  I will commit this shortly.

I understand maintainers are very busy, so I rarely ping them. However,
another three months have passed, and I'm afraid this patch might have
been overlooked. I'd like to check on its current status.

Regards,
Kuan-Wei

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-12-09  0:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-02 21:48 [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
2023-12-04  8:20 ` Florian Weimer
2023-12-04 18:31   ` Kuan-Wei Chiu
2023-12-05 10:44     ` Florian Weimer
2023-12-05 20:00       ` Kuan-Wei Chiu
2023-12-05 20:35     ` Zack Weinberg
2023-12-06 10:21       ` Florian Weimer
2023-12-06 12:51         ` Adhemerval Zanella Netto
2023-12-17 15:42           ` Zack Weinberg
2023-12-17 15:55             ` Florian Weimer
2023-12-17 16:47             ` Kuan-Wei Chiu
2023-12-17 18:02               ` Florian Weimer
2023-12-05  3:22   ` [PATCH v2 0/2] stdlib: Optimize number of calls to comparison function in qsort Kuan-Wei Chiu
2023-12-05  3:22     ` [PATCH v2 1/2] " Kuan-Wei Chiu
2023-12-05  3:22     ` [PATCH v2 2/2] stdlib: Adjust the factor in tst-qsort5 Kuan-Wei Chiu
2024-02-16  7:08 ` [PATCH] stdlib: Optimize number of calls to comparison function Kuan-Wei Chiu
2024-03-27 19:45   ` Adhemerval Zanella Netto
2024-03-27 19:59     ` Kuan-Wei Chiu
2024-03-27 20:42       ` Adhemerval Zanella Netto
2024-09-01  2:09         ` Kuan-Wei Chiu
2024-09-02 13:45           ` Adhemerval Zanella Netto
2024-12-09  0:17             ` Kuan-Wei Chiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).