From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31164 invoked by alias); 17 Jul 2013 02:43:39 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 28940 invoked by uid 48); 17 Jul 2013 02:41:36 -0000 From: "alexey.tourbin at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/57916] New: Improve std::sort paritioning by explicitly employing the pivot Date: Wed, 17 Jul 2013 02:43:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: alexey.tourbin at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-07/txt/msg00811.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57916 Bug ID: 57916 Summary: Improve std::sort paritioning by explicitly employing the pivot Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: alexey.tourbin at gmail dot com Created attachment 30515 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30515&action=edit Move the median back to its pivot position. Exclude the pivot from further partitioning. In the original SGI code, partitioning does not use the concept of pivot explicitly. The routine __unguarded_partition(first, last, pivot) only cuts the sequence into two halves: [ <= pivot | >= pivot]. Note that the actual pivot element used for partitioning can end up anywhere within the sequence, not necessarily adjacent to the cut point. (Also, note that the name "__unguarded" indicates that certain boundary checks are omitted, which places some restrictions on how the pivot can be selected.) In 2009, Chris Jefferson introduced a new helper function, __unguarded_partition_pivot(first, last) which calls __unguarded_partition using the median of 3 elements as a pivot. The median is actually placed into the first element of the sequence and excluded from partitioning: [pivot| <= pivot | >= pivot]. (Since this pivot selection scheme satisfies the restrictions placed by __unguarded_partition, the "__unguarded" part ought to be excluded from the name of this helper function.) However, when partitioning is finished, this helper routine does NOT install the pivot back to its final position: [ <= pivot | pivot | >= pivot ]. Although it can be argued that the resulting code is still correct, it does not follow the common practice and misses an opportunity for optimization. Indeed, by installing the pivot back to its final position, we can exclude it from further partitioning stages. Although the change is small, it propagates throughout partitioning tree and produces a cumulative effect. Measurements show (see below) that this change improves std::sort running time by 2% to %3. The same technique of excluding the pivot can also be applied to __introselect helper function (used to implement the nth_element routine). Actually we can check whether the pivot returned by partitioning is exactly the nth element, and return immediately in this case. Although an improvement in running time, if any, is hard to measure, there is an important special case: when the nth_element is used to select the middle element of the sequence, and the sequence is already sorted, only one partitioning stage will be executed. I use the following program to evaluate the effect on the running time of std::sort. #include #include #include #include #define N (64 << 20) int a[N]; int main() { for (size_t i = 0; i < N; i++) a[i] = rand(); std::time_t start = std::clock(); std::sort(a, a + N); std::cout << (clock() - start) / (double) CLOCKS_PER_SEC << std::endl; return 0; } (before the change) $ g++ -O2 test-sort.cc && ./a.out 9.48 (after the change) $ g++ -I. -O2 test-sort.cc && ./a.out 9.29 Callgrind annotations also indicate an improvement - the number of instruction reads has actually dropped by 10%! (before this change) 16,344,046,167 PROGRAM TOTALS 10,249,743,434 ???:void std::__introsort_loop(int*, int*, long)'2 2,263,595,692 ???:main 1,742,665,662 /usr/src/debug/glibc-2.17-alt5/stdlib/random_r.c:random_r 1,140,850,688 /usr/src/debug/glibc-2.17-alt5/stdlib/random.c:random 677,198,672 ???:void std::__introsort_loop(int*, int*, long) 268,435,456 /usr/src/debug/glibc-2.17-alt5/stdlib/rand.c:rand (after the change) 14,976,731,729 PROGRAM TOTALS 9,180,509,971 ???:void std::__introsort_loop(int*, int*, long)'2 2,110,233,226 ???:main 1,742,665,662 /usr/src/debug/glibc-2.17-alt5/stdlib/random_r.c:random_r 1,140,850,688 /usr/src/debug/glibc-2.17-alt5/stdlib/random.c:random 532,670,993 ???:void std::__introsort_loop(int*, int*, long) 268,435,456 /usr/src/debug/glibc-2.17-alt5/stdlib/rand.c:rand