public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: newlib@sourceware.org, brian.inglis@systematicsw.ab.ca
Subject: Re: [PATCH] Implement introsort for qsort
Date: Sun, 14 Jan 2024 01:25:46 +0800	[thread overview]
Message-ID: <ZaLHmpsQK3mSPo0x@visitorckw-System-Product-Name> (raw)
In-Reply-To: <20231230195940.1989559-1-visitorckw@gmail.com>

On Sun, Dec 31, 2023 at 03:59:40AM +0800, Kuan-Wei Chiu wrote:
> Enhances the qsort implementation by introducing introsort to address
> the worst-case time complexity of O(n^2) associated with quicksort.
> Introsort is utilized to switch to heapsort when quicksort recursion
> depth becomes excessive, ensuring a worst-case time complexity of
> O(n log n).
> 
> The heapsort implementation adopts a bottom-up approach, significantly
> reducing the number of required comparisons and enhancing overall
> efficiency.
> 
> Refs:
>   Introspective Sorting and Selection Algorithms
>   David R. Musser
>   Software—Practice & Experience, 27(8); Pages 983–993, Aug 1997
>   https://dl.acm.org/doi/10.5555/261387.261395
> 
>   A killer adversary for quicksort
>   M. D. McIlroy
>   Software—Practice & Experience, 29(4); Pages 341–344, 10 April 1999
>   https://dl.acm.org/doi/10.5555/311868.311871
> 
>   BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
>   QUICKSORT (if n is not very small)
>   Ingo Wegener
>   Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
>   https://dl.acm.org/doi/10.5555/162625.162643
> 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> To assess the performance of the new introsort and the old quicksort in
> worst-case scenarios, we examined the number of comparisons required
> for sorting based on the paper "A killer adversary for quicksort."
> 
> Before the patch:
> n = 1000,  cmp_count = 100853
> n = 2000,  cmp_count = 395991
> n = 3000,  cmp_count = 885617
> n = 4000,  cmp_count = 1569692
> n = 5000,  cmp_count = 2448183
> n = 6000,  cmp_count = 3521140
> n = 7000,  cmp_count = 4788493
> n = 8000,  cmp_count = 6250342
> n = 9000,  cmp_count = 7906610
> n = 10000,  cmp_count = 9757353
> 
> After the patch:
> n = 1000,  cmp_count = 27094
> n = 2000,  cmp_count = 61500
> n = 3000,  cmp_count = 100529
> n = 4000,  cmp_count = 136538
> n = 5000,  cmp_count = 182698
> n = 6000,  cmp_count = 221120
> n = 7000,  cmp_count = 259933
> n = 8000,  cmp_count = 299058
> n = 9000,  cmp_count = 356405
> n = 10000,  cmp_count = 397723
> 
>  newlib/libc/search/qsort.c | 153 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 147 insertions(+), 6 deletions(-)

Hi Brian,

Thank you for your feedback, and I apologize for the late response.

I found your reply in the archives of the newlib mailing list on the
web interface. However, for some reason, I did not find the same email
in my email client.

Upon reviewing the previously mentioned issue, it seems that although
both instances lead to a worst-case degradation to O(n^2), the causes
appear to be different. One may be the adoption of insertion sort at an
inappropriate time, causing degradation, while the other could be
adversarial input leading to the selection of an unsuitable pivot. I
can delve deeper into this issue later, but I believe it should be
different patches.

I looked at the code at BSD implementation [1], but I am skeptical that
it would address the problem of adversarial input degrading to O(n^2).
I will conduct experiments in subsequent emails to verify this.

Thank you for reminding me to consider not only the number of
comparisons but also the number of swaps. Indeed, switching to heapsort
may likely increase the required number of swaps, but while the number
of swaps would be at an O(n log n) level, the number of comparisons
would be at an O(n^2) level. Therefore, a trade-off needs to be made,
and I will conduct further experiments. If increasing the number of
swaps does lead to degradation in efficiency, it would be better to
drop this patch.

Regarding the experimental data, as I am not a native English speaker,
I would like to inquire about the meaning of "two values out of order
repeated to fill the array, in alternating rows and alternate halves."
This clarification will help me avoid any misunderstanding of the text.
I will post new experimental data within the next few days.

Once again, thank you for your feedback and suggestions!

Best regards,
Kuan-Wei Chiu

Link: https://cgit.freebsd.org/src/tree/lib/libc/stdlib/qsort.c [1]


  parent reply	other threads:[~2024-01-13 17:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-30 19:59 Kuan-Wei Chiu
2023-12-31  6:20 ` brian.inglis
2024-01-13 17:25 ` Kuan-Wei Chiu [this message]
2024-01-14 18:04   ` brian.inglis
2024-01-15 23:38 ` Kuan-Wei Chiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaLHmpsQK3mSPo0x@visitorckw-System-Product-Name \
    --to=visitorckw@gmail.com \
    --cc=brian.inglis@systematicsw.ab.ca \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).