public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/28591] New: _qsort.c: quicksort threshold for switching to insertion sort - should it be cleverer?
@ 2021-11-13  9:58 jsourceware at julianbradfield dot org
  0 siblings, 0 replies; only message in thread
From: jsourceware at julianbradfield dot org @ 2021-11-13  9:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28591

            Bug ID: 28591
           Summary: _qsort.c: quicksort threshold for switching to
                    insertion sort - should it be cleverer?
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: jsourceware at julianbradfield dot org
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

_quicksort switches to insertion sort when the partition size drops below 4.
According to the comments, this is a magic number that worked on a Sun 4/80 (I
remember those).
A bit of experimenting suggests that 4 is only appropriate in certain cases.

I tried a few cases:

* small data, small number of data, very expensive comparison: sweet spot
around 25. (Sorting 1000 random positive ints by comparing their number of
divisors: thresh=25 is 25% faster than thresh=4)
* large data, small number of data, cheap comparison: sweet spot around 1000.
(Sorting blocks of 1M ints by their first element).
* small data, large number of data, cheap comparison: threshold larger than
available memory (sorting integers: sorting 1E8 random integers takes 1.5s with
threshold at 1E8 (i.e. pure insertion sort) as opposed to 19s with threshold=4)

The case where 4 is the right value is

* large data, very expensive comparison (sorting 1M blocks by first element
using number of divisors).

This is the cost model used in the original paper (comparison most expensive,
then swapping, then bookkeeping), but I wonder what the real life workload of
_quicksort is (on the rare occasions that it actually gets called rather than
mergesort).

Would glibc be interested in a study to find empirically good values of
MAX_THRESHOLD for the modern world? One could make it parametrized on datum
size and number of data, though not on cost of comparison unless it actually
measures some trial comparisons. If so, I could set a student on it next year.

Or, has it been done and I haven't found it?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-11-13  9:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-13  9:58 [Bug libc/28591] New: _qsort.c: quicksort threshold for switching to insertion sort - should it be cleverer? jsourceware at julianbradfield dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).