From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 599833858410; Sat, 13 Nov 2021 09:58:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 599833858410 From: "jsourceware at julianbradfield dot org" To: glibc-bugs@sourceware.org Subject: [Bug libc/28591] New: _qsort.c: quicksort threshold for switching to insertion sort - should it be cleverer? Date: Sat, 13 Nov 2021 09:58:57 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: jsourceware at julianbradfield dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Nov 2021 09:58:57 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D28591 Bug ID: 28591 Summary: _qsort.c: quicksort threshold for switching to insertion sort - should it be cleverer? Product: glibc Version: unspecified Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: jsourceware at julianbradfield dot org CC: drepper.fsp at gmail dot com Target Milestone: --- _quicksort switches to insertion sort when the partition size drops below 4. According to the comments, this is a magic number that worked on a Sun 4/80= (I remember those). A bit of experimenting suggests that 4 is only appropriate in certain cases. I tried a few cases: * small data, small number of data, very expensive comparison: sweet spot around 25. (Sorting 1000 random positive ints by comparing their number of divisors: thresh=3D25 is 25% faster than thresh=3D4) * large data, small number of data, cheap comparison: sweet spot around 100= 0. (Sorting blocks of 1M ints by their first element). * small data, large number of data, cheap comparison: threshold larger than available memory (sorting integers: sorting 1E8 random integers takes 1.5s = with threshold at 1E8 (i.e. pure insertion sort) as opposed to 19s with threshol= d=3D4) The case where 4 is the right value is * large data, very expensive comparison (sorting 1M blocks by first element using number of divisors). This is the cost model used in the original paper (comparison most expensiv= e, then swapping, then bookkeeping), but I wonder what the real life workload = of _quicksort is (on the rare occasions that it actually gets called rather th= an mergesort). Would glibc be interested in a study to find empirically good values of MAX_THRESHOLD for the modern world? One could make it parametrized on datum size and number of data, though not on cost of comparison unless it actually measures some trial comparisons. If so, I could set a student on it next ye= ar. Or, has it been done and I haven't found it? --=20 You are receiving this mail because: You are on the CC list for the bug.=