* __inplace_stable_sort improvement @ 2021-11-18 12:58 David de Kloet 2021-11-24 18:33 ` François Dumont 0 siblings, 1 reply; 6+ messages in thread From: David de Kloet @ 2021-11-18 12:58 UTC (permalink / raw) To: libstdc++ Hello libstdc++, Apologies if this is the wrong list for my message. I believe I have an improvement for __inplace_stable_sort and I'm wondering if this is worth pursuing and if so how to proceed. __inplace_stable_sort does a merge sort where in each recursive step the 2 sublists are merged in-place. The merge step is itself also recursive. In order to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 of [B, C) are found such that all elements in S1 are greater than all elements in S2 and then those sections are swapped/rotated and the shorter sections are merged recursively. This is done by the function __merge_without_buffer. In the current implementation, __merge_without_buffer takes the half point of the longer section and does a binary search (via lower_bound or upper_bound) to find the corresponding cut point on the other section. Then it rotates the sections. My proposal is to find the cut points such that both rotated sections have the same length such that they can always simply be swapped. I have implemented this and it seems to perform better in general and in the special cases that I've tried. Unfortunately, since we are no longer guaranteed to halve the longer section in each step, there is a possibility for a stack overflow. To avoid this, the new algorithm falls back to the old behavior whenever the sections have significantly different lengths. Also, where the current algorithm performs the binary search implicitly via lower_bound/upper_bound, the new algorithm has to implement the binary search explicitly because it has to look at both sections to determine the next step in the search. I have created wrappers for my value objects as well as my iterators to count the number of operations performed by the current and my algorithm and run benchmarks on various distributions. My code is at the bottom but below is the output as run on my machine. For each operation it outputs the ratio between the number of operations performed by my algorithm over the number performed by the current algorithm. Thanks for your attention, David Distribution: random_unique(100000) Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] move: 0.588026 = 3331848 / 5666154 moveAssign: 0.594789 = 6852819 / 11521431 compare: 0.914997 = 2028977 / 2217469 iteratorDeref: 0.718466 = 11001863 / 15312994 iteratorCopy: 0.791332 = 33977189 / 42936712 iteratorEquals: 2.27841 = 5174606 / 2271143 iteratorInc: 1.2143 = 7697971 / 6339449 iteratorDec: 0.0833529 = 473556 / 5681340 iteratorDiff: 0.444865 = 1362598 / 3062950 iteratorSub: 0.0200683 = 6272 / 312533 iteratorAdd: 0.0695467 = 57149 / 821736 iteratorAddAssign: 1.36491 = 3141949 / 2301950 iteratorArithmetic: 0.861623 = 17914101 / 20791101 micros: 0.663704 = 650825 / 980596 Distribution: random_dupes(100000) Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] move: 0.570041 = 2949623 / 5174401 moveAssign: 0.577736 = 6087819 / 10537375 compare: 0.845419 = 1530877 / 1810790 iteratorDeref: 0.678233 = 9232869 / 13613114 iteratorCopy: 0.742699 = 25652244 / 34539225 iteratorEquals: 2.41643 = 4123881 / 1706601 iteratorInc: 1.15617 = 6554374 / 5669028 iteratorDec: 0.101004 = 539991 / 5346219 iteratorDiff: 0.417093 = 959972 / 2301580 iteratorSub: 0.0350046 = 8882 / 253738 iteratorAdd: 0.0972571 = 64122 / 659304 iteratorAddAssign: 1.2279 = 2247197 / 1830107 iteratorArithmetic: 0.81605 = 14498419 / 17766577 micros: 0.699386 = 554426 / 792732 Distribution: increasing_unique(100000) Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] move: 1 = 91808 / 91808 moveAssign: 1 = 91808 / 91808 compare: 0.573591 = 224558 / 391495 iteratorDeref: 0.722385 = 540924 / 748803 iteratorCopy: 0.298175 = 976735 / 3275708 iteratorEquals: 0.522302 = 116383 / 222827 iteratorInc: 0.715444 = 132750 / 185549 iteratorDec: 1 = 91808 / 91808 iteratorDiff: 0.217455 = 40956 / 188342 iteratorAdd: 1 = 16383 / 16383 iteratorAddAssign: 0.299523 = 81884 / 273381 iteratorArithmetic: 0.49082 = 480164 / 978290 micros: 0.479792 = 22935 / 47802 Distribution: increasing_dupes(100000) Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] move: 1 = 91808 / 91808 moveAssign: 1 = 91808 / 91808 compare: 0.573591 = 224558 / 391495 iteratorDeref: 0.722385 = 540924 / 748803 iteratorCopy: 0.298175 = 976735 / 3275708 iteratorEquals: 0.522302 = 116383 / 222827 iteratorInc: 0.715444 = 132750 / 185549 iteratorDec: 1 = 91808 / 91808 iteratorDiff: 0.217455 = 40956 / 188342 iteratorAdd: 1 = 16383 / 16383 iteratorAddAssign: 0.299523 = 81884 / 273381 iteratorArithmetic: 0.49082 = 480164 / 978290 micros: 0.449154 = 20675 / 46031 Distribution: mostly_increasing(100000) Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] move: 0.681093 = 814725 / 1196203 moveAssign: 0.703421 = 1809563 / 2572519 compare: 0.924016 = 965320 / 1044700 iteratorDeref: 0.847765 = 3802634 / 4485480 iteratorCopy: 0.804068 = 11859111 / 14748894 iteratorEquals: 1.59826 = 1388752 / 868913 iteratorInc: 1.25659 = 1818421 / 1447111 iteratorDec: 0.383738 = 537108 / 1399674 iteratorDiff: 0.486488 = 562273 / 1155779 iteratorSub: 0.0638659 = 6666 / 104375 iteratorAdd: 0.184194 = 55586 / 301780 iteratorAddAssign: 1.30369 = 1105439 / 847932 iteratorArithmetic: 0.893672 = 5474245 / 6125564 micros: 0.790603 = 233953 / 295917 Distribution: decreasing_unique(100000) Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] move: 0.40597 = 995312 / 2451688 moveAssign: 0.45785 = 2459840 / 5372592 compare: 0.489247 = 151631 / 309927 iteratorDeref: 0.524798 = 3415934 / 6509043 iteratorCopy: 0.563628 = 5971275 / 10594360 iteratorEquals: 1.18395 = 843533 / 712475 iteratorInc: 0.864454 = 1460463 / 1689462 iteratorDec: 0.373125 = 1606976 / 4306800 iteratorDiff: 0.495226 = 235930 / 476409 iteratorSub: 0.248254 = 13824 / 55685 iteratorAdd: 0.590877 = 135839 / 229894 iteratorAddAssign: 0.421852 = 119646 / 283621 iteratorArithmetic: 0.569514 = 4416211 / 7754346 micros: 0.543145 = 167865 / 309061 Distribution: decreasing_dupes(100000) Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] move: 0.741695 = 1455276 / 1962096 moveAssign: 0.736048 = 2826598 / 3840238 compare: 0.641679 = 282851 / 440798 iteratorDeref: 0.740565 = 3386062 / 4572271 iteratorCopy: 0.628798 = 5008472 / 7965150 iteratorEquals: 1.50206 = 794354 / 528843 iteratorInc: 1.39602 = 2164546 / 1550510 iteratorDec: 0.336639 = 837728 / 2488504 iteratorDiff: 0.313457 = 88552 / 282501 iteratorSub: 0.165388 = 3880 / 23460 iteratorAdd: 0.377644 = 27139 / 71864 iteratorAddAssign: 0.551364 = 182383 / 330785 iteratorArithmetic: 0.776766 = 4098582 / 5276467 micros: 0.71282 = 160398 / 225019 Distribution: mostly_decreasing(100000) Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, 6, 4, 3, 2] move: 0.760602 = 2197435 / 2889075 moveAssign: 0.768473 = 4591313 / 5974593 compare: 0.932913 = 977678 / 1047984 iteratorDeref: 0.838641 = 6633169 / 7909424 iteratorCopy: 0.829028 = 15447960 / 18633830 iteratorEquals: 1.7009 = 2000686 / 1176251 iteratorInc: 1.52771 = 3788203 / 2479657 iteratorDec: 0.363843 = 1369990 / 3765332 iteratorDiff: 0.489186 = 595472 / 1217270 iteratorSub: 0.0877827 = 10925 / 124455 iteratorAdd: 0.201991 = 69640 / 344767 iteratorAddAssign: 1.32756 = 1133512 / 853830 iteratorArithmetic: 0.900303 = 8968428 / 9961562 micros: 0.846298 = 377504 / 446065 Distribution: sawtooth_unique(100000) Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] move: 0.569953 = 2912465 / 5110007 moveAssign: 0.577701 = 6012427 / 10407511 compare: 0.824488 = 1604387 / 1945920 iteratorDeref: 0.678979 = 9299794 / 13696724 iteratorCopy: 0.72546 = 26157352 / 36056250 iteratorEquals: 2.00683 = 4131766 / 2058856 iteratorInc: 1.10728 = 6498257 / 5868688 iteratorDec: 0.111225 = 552519 / 4967557 iteratorDiff: 0.405328 = 988144 / 2437889 iteratorSub: 0.026249 = 6327 / 241038 iteratorAdd: 0.089034 = 55918 / 628052 iteratorAddAssign: 1.18737 = 2297233 / 1934717 iteratorArithmetic: 0.801143 = 14530164 / 18136797 micros: 0.706282 = 583937 / 826776 Distribution: sawtooth_dupes(100000) Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] move: 0.547721 = 2348873 / 4288447 moveAssign: 0.543252 = 4613817 / 8492965 compare: 0.664642 = 841503 / 1266099 iteratorDeref: 0.599656 = 6292282 / 10493159 iteratorCopy: 0.60663 = 15336816 / 25281976 iteratorEquals: 2.24803 = 2859760 / 1272118 iteratorInc: 1.07142 = 4999546 / 4666276 iteratorDec: 0.0490551 = 205953 / 4198405 iteratorDiff: 0.314518 = 473157 / 1504390 iteratorSub: 0.0371951 = 5998 / 161258 iteratorAdd: 0.0875684 = 34672 / 395942 iteratorAddAssign: 0.900009 = 1233077 / 1370072 iteratorArithmetic: 0.72316 = 9812163 / 13568461 micros: 0.610891 = 353102 / 578011 Distribution: increasing_with_swap(100000) Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] move: 1 = 291782 / 291782 moveAssign: 1 = 491779 / 491779 compare: 0.57449 = 225064 / 391763 iteratorDeref: 0.819553 = 941922 / 1149312 iteratorCopy: 0.377964 = 1392386 / 3683915 iteratorEquals: 0.525758 = 117408 / 223312 iteratorInc: 0.863708 = 333050 / 385605 iteratorDec: 0.999753 = 291716 / 291788 iteratorDiff: 0.224408 = 42528 / 189512 iteratorSub: 0.888889 = 104 / 117 iteratorAdd: 0.997151 = 16801 / 16849 iteratorAddAssign: 0.302604 = 82864 / 273836 iteratorArithmetic: 0.640448 = 884471 / 1381019 micros: 0.616124 = 36233 / 58808 Distribution: increasing_fast_slow(100000) Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] move: 0.640348 = 825772 / 1289568 moveAssign: 0.627081 = 1559792 / 2487384 compare: 0.714235 = 431286 / 603843 iteratorDeref: 0.694913 = 2421875 / 3485151 iteratorCopy: 0.637176 = 5645877 / 8860786 iteratorEquals: 1.7552 = 953880 / 543458 iteratorInc: 1.08941 = 1676609 / 1539005 iteratorDec: 0.135904 = 154533 / 1137073 iteratorDiff: 0.388132 = 192787 / 496705 iteratorSub: 0.00577964 = 141 / 24396 iteratorAdd: 0.207287 = 17272 / 83324 iteratorAddAssign: 0.849015 = 431320 / 508024 iteratorArithmetic: 0.790987 = 3426542 / 4331985 micros: 0.777699 = 153552 / 197444 #include <iostream> #include <stdio.h> #include <vector> #include <cstdlib> #include <algorithm> #include <functional> #include <ranges> #include <chrono> #include <cmath> #include <numeric> using namespace std; #define LINE0 0 #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") #define vout(v) #v << " = " << (v) #define print1(v) cout_line << vout(v) << endl #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) << ", " << vout(v3) << endl #define GET_MACRO(_1,_2,_3,NAME,...) NAME #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) template <class Ch, class Tr, class Container> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, Container const& x); template <class X, class Y> ostream & operator << (ostream & os, pair <X, Y> const& p) { return os << p.first << ":" << p.second; } template <class Ch, class Tr, class Container> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, Container const& x) { os << "["; bool first = true; for(auto const& y : x) { if (!first) os << ", "; os << y; first = false; } return os << "]"; } struct Counter { int construct = 0; int copy = 0; int assign = 0; int move = 0; int moveAssign = 0; int compare = 0; int iteratorDeref = 0; int iteratorCopy = 0; int iteratorEquals = 0; int iteratorInc = 0; int iteratorDec = 0; int iteratorDiff = 0; int iteratorSub = 0; int iteratorAdd = 0; int iteratorAddAssign = 0; int iteratorArithmetic = 0; int micros = 0; }; int next_element_id = 0; class Element { Counter* counter; public: int val; int id; // Distinguis identical values to verify stability. Element(int val, Counter* counter) : val(val), counter(counter), id(++next_element_id) { ++counter->construct; } Element(Element const& other) : val(other.val), counter(other.counter), id(other.id) { ++counter->copy; } Element& operator=(Element const& other) { val = other.val; counter = other.counter; id = other.id; ++counter->assign; return *this; } Element(Element&& other) noexcept : val(move(other.val)), counter(other.counter), id(other.id) { ++counter->move; } Element& operator=(Element&& other) noexcept { val = move(other.val); counter = other.counter; id = other.id; ++counter->moveAssign; return *this; } bool operator<(Element const& other) const { ++counter->compare; return val < other.val; } bool operator==(Element const& other) const { return val == other.val && id == other.id; } }; ostream & operator << (ostream & os, Element const& e) { return os << e.val << "(" << e.id << ")"; } class RandomIterator { typedef vector<Element>::iterator It; It it; Counter* counter; public: using iterator_category = std::random_access_iterator_tag; using difference_type = std::ptrdiff_t; using value_type = Element; using pointer = value_type*; using reference = value_type&; RandomIterator(It it, Counter* counter) : it(it), counter(counter) { } RandomIterator(RandomIterator const& other) : it(other.it), counter(other.counter) { counter->iteratorCopy += 1; } bool operator== (RandomIterator other) { counter->iteratorEquals += 1; counter->iteratorArithmetic += 1; return it == other.it; } RandomIterator operator++ () { counter->iteratorInc += 1; counter->iteratorArithmetic += 1; return RandomIterator(++it, counter); } RandomIterator operator-- () { counter->iteratorDec += 1; counter->iteratorArithmetic += 1; return RandomIterator(--it, counter); } Element& operator* () { counter->iteratorDeref += 1; return *it; } difference_type operator- (RandomIterator other) { counter->iteratorDiff += 1; counter->iteratorArithmetic += 1; return it - other.it; } RandomIterator operator- (difference_type diff) { counter->iteratorSub += 1; counter->iteratorArithmetic += 1; return RandomIterator(it - diff, counter); } RandomIterator operator+ (difference_type diff) { counter->iteratorAdd += 1; counter->iteratorArithmetic += 1; return RandomIterator(it + diff, counter); } RandomIterator operator+= (difference_type diff) { counter->iteratorAddAssign += 1; counter->iteratorArithmetic += 1; return RandomIterator(it += diff, counter); } }; class TestCase { Counter counter; vector<Element> elements; public: TestCase(vector<int> const& data) { elements.reserve(data.size()); for (int i: data) { elements.push_back(Element(i, &counter)); } } Counter sort( void algo(RandomIterator, RandomIterator), vector<Element>::iterator begin, vector<Element>::iterator end) { counter = Counter(); auto t0 = std::chrono::system_clock::now(); algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); auto t1 = std::chrono::system_clock::now(); counter.micros = std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count(); return counter; } pair<Counter,Counter> benchmark( void new_algo(RandomIterator, RandomIterator), void cur_algo(RandomIterator, RandomIterator)) { vector<Element> toSort1 = elements; Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); vector<Element> toSort2 = elements; Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); if (toSort1 != toSort2) { cout << "INCORRECT RESULT!" << endl; exit(1); } return make_pair(counter1, counter2); } }; void relative_metric(string name, int m1, int m2) { if (m1 == 0 && m2 == 0) return; if (m2 == 0) { cout << name << ": inf = " << m1 << " / " << m2 << endl; } else { cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " << m2 << endl; } } void relative_report(Counter const& c1, Counter const& c2) { cout << endl; relative_metric("construct", c1.construct, c2.construct); relative_metric("copy", c1.copy, c2.copy); relative_metric("assign", c1.assign, c2.assign); relative_metric("move", c1.move, c2.move); relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); relative_metric("compare", c1.compare, c2.compare); relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); relative_metric("iteratorAddAssign", c1.iteratorAddAssign, c2.iteratorAddAssign); relative_metric("iteratorArithmetic", c1.iteratorArithmetic, c2.iteratorArithmetic); relative_metric("micros", c1.micros, c2.micros); cout << endl; } /// This is a helper function for the merge routines. template<typename _BidirectionalIterator, typename _Distance, typename _Compare> void _my_merge_without_buffer( _BidirectionalIterator __first, _BidirectionalIterator __middle, _BidirectionalIterator __last, _Distance __len1, _Distance __len2, _Compare __comp) { if (__len1 == 0 || __len2 == 0) return; if (__len1 + __len2 == 2) { if (__comp(__middle, __first)) std::iter_swap(__first, __middle); return; } _BidirectionalIterator __first_cut = __first; _BidirectionalIterator __second_cut = __middle; _Distance __len11 = 0; _Distance __len22 = 0; #define NEW_ALGO 1 #ifdef NEW_ALGO // Prevent stack overflow if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { #endif if (__len1 > __len2) { __len11 = __len1 / 2; std::advance(__first_cut, __len11); __second_cut = std::__lower_bound(__middle, __last, *__first_cut, __gnu_cxx::__ops::__iter_comp_val(__comp)); __len22 = std::distance(__middle, __second_cut); } else { __len22 = __len2 / 2; std::advance(__second_cut, __len22); __first_cut = std::__upper_bound(__first, __middle, *__second_cut, __gnu_cxx::__ops::__val_comp_iter(__comp)); __len11 = std::distance(__first, __first_cut); } #ifdef NEW_ALGO } else { _Distance __cut_range; // __cut_len is always equal to __middle - __first_cut (and eventually to // __second_cut - __middle) but we keep track of it to avoid calculating it // later which can be expensive for bidirectional iterators. _Distance __cut_len; if (__len1 > __len2) { std::advance(__first_cut, __len1 - __len2); __cut_range = __len2; } else { __cut_range = __len1; } __cut_len = __cut_range; while (__cut_range > 0) { _Distance __half_cut_range = __cut_range / 2; _BidirectionalIterator __mid_first_cut = __first_cut; std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); _BidirectionalIterator __mid_second_cut = __second_cut; std::advance(__mid_second_cut, __half_cut_range); if (__comp(__mid_second_cut, __mid_first_cut)) { __cut_range -= __half_cut_range + 1; __second_cut = __mid_second_cut; ++__second_cut; } else { __cut_len -= __cut_range - __half_cut_range; __cut_range = __half_cut_range; __first_cut = __mid_first_cut; ++__first_cut; } } __len22 = __cut_len; __len11 = __len1 - __cut_len; } #endif _BidirectionalIterator __new_middle = std::rotate(__first_cut, __middle, __second_cut); _my_merge_without_buffer(__first, __first_cut, __new_middle, __len11, __len22, __comp); _my_merge_without_buffer(__new_middle, __second_cut, __last, __len1 - __len11, __len2 - __len22, __comp); } /// This is a helper function for the stable sorting routines. template<typename _RandomAccessIterator, typename _Compare> void _my_inplace_stable_sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) { if (__last - __first < 15) { std::__insertion_sort(__first, __last, __comp); return; } _RandomAccessIterator __middle = __first + (__last - __first) / 2; _my_inplace_stable_sort(__first, __middle, __comp); _my_inplace_stable_sort(__middle, __last, __comp); _my_merge_without_buffer(__first, __middle, __last, __middle - __first, __last - __middle, __comp); } template<typename Iterator> void my_inplace_stable_sort(Iterator start, Iterator end) { _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); } template<typename Iterator> void std_inplace_stable_sort(Iterator start, Iterator end) { std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); } void benchmark(string const& dist_name, vector<int> dist_func(int), int n) { vector<int> v = dist_func(n); cout << "Distribution: " << dist_name << "(" << n << ")" << endl; cout << "Example: " << dist_func(20) << endl; TestCase testCase(v); pair<Counter, Counter> counter_pair = testCase.benchmark( my_inplace_stable_sort , std_inplace_stable_sort ); Counter cKloet = counter_pair.first; Counter cInplace = counter_pair.second; relative_report(cKloet, cInplace); } vector<int> random_unique(int n) { vector<int> v(n); for (int i = 0; i < n; ++i) { v[i] = i; } for (int i = n - 1; i >= 0; --i) { swap(v[i], v[rand() % (i + 1)]); } return v; } vector<int> random_dupes(int n) { vector<int> v(n); int r = (int)sqrt(n); for (int i = 0; i < n; ++i) { v[i] = i / r; } for (int i = n - 1; i >= 0; --i) { swap(v[i], v[rand() % (i + 1)]); } return v; } vector<int> increasing_unique(int n) { vector<int> v(n); for (int i = 0; i < n; ++i) { v[i] = i; } return v; } vector<int> increasing_dupes(int n) { vector<int> v(n); int r = (int)sqrt(n); for (int i = 0; i < n; ++i) { v[i] = i / r; } return v; } vector<int> mostly_increasing(int n) { vector<int> v(n); int r = (int)sqrt(n); for (int i = 0; i < n; ++i) { v[i] = i + rand() % r; } return v; } vector<int> decreasing_unique(int n) { vector<int> v(n); for (int i = 0; i < n; ++i) { v[i] = n - 1 - i; } return v; } vector<int> decreasing_dupes(int n) { vector<int> v(n); int r = (int)sqrt(n); for (int i = 0; i < n; ++i) { v[i] = (n - 1 - i) / r; } return v; } vector<int> mostly_decreasing(int n) { vector<int> v(n); int r = (int)sqrt(n); for (int i = 0; i < n; ++i) { v[i] = n - 1 - i + rand() % r; } return v; } vector<int> sawtooth_unique(int n) { int r = (int)sqrt(n); while (gcd(r, n) != 1) ++r; vector<int> v(n); for (int i = 0; i < n; ++i) { v[(i * r) % n] = i; } return v; } vector<int> sawtooth_dupes(int n) { int r = (int)sqrt(n); vector<int> v(n); for (int i = 0; i < n; ++i) { v[i] = i % r; } return v; } vector<int> increasing_with_swap(int n) { vector<int> v(n); for (int i = 0; i < n; ++i) { v[i] = i; } swap(v[0], v[n - 1]); return v; } vector<int> increasing_fast_slow(int n) { int r = (int)sqrt(n); vector<int> v(n); for (int i = 0; i < r; ++i) { v[i] = i * r; } for (int i = r; i < n; ++i) { v[i] = (i - r) * n / (n - r); } return v; } #define BENCHMARK(dist, n) benchmark(#dist, dist, n) int main() { int n = 100000; BENCHMARK(random_unique, n); BENCHMARK(random_dupes, n); BENCHMARK(increasing_unique, n); BENCHMARK(increasing_dupes, n); BENCHMARK(mostly_increasing, n); BENCHMARK(decreasing_unique, n); BENCHMARK(decreasing_dupes, n); BENCHMARK(mostly_decreasing, n); BENCHMARK(sawtooth_unique, n); BENCHMARK(sawtooth_dupes, n); BENCHMARK(increasing_with_swap, n); BENCHMARK(increasing_fast_slow, n); return 0; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: __inplace_stable_sort improvement 2021-11-18 12:58 __inplace_stable_sort improvement David de Kloet @ 2021-11-24 18:33 ` François Dumont 2021-11-24 18:48 ` David de Kloet 0 siblings, 1 reply; 6+ messages in thread From: François Dumont @ 2021-11-24 18:33 UTC (permalink / raw) To: David de Kloet, libstdc++ On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > Hello libstdc++, > > Apologies if this is the wrong list for my message. I believe I have an > improvement for __inplace_stable_sort and I'm wondering if this is worth > pursuing and if so how to proceed. Of course, anyone interested in enhancing the library is welcome. The best way to do so would be to provide a patch on the current library code. I'm the last one to have touch to this algo and it would be easier for me to understand if you try to implement it directly on the library. The good news is that you are trying to enhance a piece of code rarely used as it is only when the buffer allocation fails. So it is less critical and so perhaps more open to modifications. During my last work on it I've added a performance test: performance/25_algorithms/inplace_merge.cc. It is also supposed to check performance when we fail to get a buffer and so use __merge_without_buffer. Use 'make check-performance' to run it, see library manual guide. Do not hesitate to change it if you think that your benches are better. > __inplace_stable_sort does a merge sort where in each recursive step the 2 > sublists are merged in-place. The merge step is itself also recursive. In order > to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 > of [B, C) are found such that all elements in S1 are greater than all elements > in S2 and then those sections are swapped/rotated and the shorter sections are > merged recursively. This is done by the function __merge_without_buffer. > > In the current implementation, __merge_without_buffer takes the half point of > the longer section and does a binary search (via lower_bound or upper_bound) to > find the corresponding cut point on the other section. Then it rotates the > sections. > > My proposal is to find the cut points such that both rotated sections have the > same length such that they can always simply be swapped. What do you mean by swapped ? I see in the code below that you are just changing how the cut positions are computed and then call the same std::rotate algo. I cannot see any reason why changing the position would make the rotate faster. > I have implemented > this and it seems to perform better in general and in the special cases that > I've tried. > > Unfortunately, since we are no longer guaranteed to halve the longer section in > each step, there is a possibility for a stack overflow. To avoid this, the new > algorithm falls back to the old behavior whenever the sections have > significantly different lengths. > > Also, where the current algorithm performs the binary search implicitly via > lower_bound/upper_bound, the new algorithm has to implement the binary search > explicitly because it has to look at both sections to determine the next step > in the search. > > I have created wrappers for my value objects as well as my iterators to count > the number of operations performed by the current and my algorithm and run > benchmarks on various distributions. My code is at the bottom but below is the > output as run on my machine. For each operation it outputs the ratio between > the number of operations performed by my algorithm over the number performed by > the current algorithm. > > Thanks for your attention, > David > > > Distribution: random_unique(100000) > Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] > > move: 0.588026 = 3331848 / 5666154 > moveAssign: 0.594789 = 6852819 / 11521431 > compare: 0.914997 = 2028977 / 2217469 > iteratorDeref: 0.718466 = 11001863 / 15312994 > iteratorCopy: 0.791332 = 33977189 / 42936712 > iteratorEquals: 2.27841 = 5174606 / 2271143 > iteratorInc: 1.2143 = 7697971 / 6339449 > iteratorDec: 0.0833529 = 473556 / 5681340 > iteratorDiff: 0.444865 = 1362598 / 3062950 > iteratorSub: 0.0200683 = 6272 / 312533 > iteratorAdd: 0.0695467 = 57149 / 821736 > iteratorAddAssign: 1.36491 = 3141949 / 2301950 > iteratorArithmetic: 0.861623 = 17914101 / 20791101 > micros: 0.663704 = 650825 / 980596 > > Distribution: random_dupes(100000) > Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] > > move: 0.570041 = 2949623 / 5174401 > moveAssign: 0.577736 = 6087819 / 10537375 > compare: 0.845419 = 1530877 / 1810790 > iteratorDeref: 0.678233 = 9232869 / 13613114 > iteratorCopy: 0.742699 = 25652244 / 34539225 > iteratorEquals: 2.41643 = 4123881 / 1706601 > iteratorInc: 1.15617 = 6554374 / 5669028 > iteratorDec: 0.101004 = 539991 / 5346219 > iteratorDiff: 0.417093 = 959972 / 2301580 > iteratorSub: 0.0350046 = 8882 / 253738 > iteratorAdd: 0.0972571 = 64122 / 659304 > iteratorAddAssign: 1.2279 = 2247197 / 1830107 > iteratorArithmetic: 0.81605 = 14498419 / 17766577 > micros: 0.699386 = 554426 / 792732 > > Distribution: increasing_unique(100000) > Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] > > move: 1 = 91808 / 91808 > moveAssign: 1 = 91808 / 91808 > compare: 0.573591 = 224558 / 391495 > iteratorDeref: 0.722385 = 540924 / 748803 > iteratorCopy: 0.298175 = 976735 / 3275708 > iteratorEquals: 0.522302 = 116383 / 222827 > iteratorInc: 0.715444 = 132750 / 185549 > iteratorDec: 1 = 91808 / 91808 > iteratorDiff: 0.217455 = 40956 / 188342 > iteratorAdd: 1 = 16383 / 16383 > iteratorAddAssign: 0.299523 = 81884 / 273381 > iteratorArithmetic: 0.49082 = 480164 / 978290 > micros: 0.479792 = 22935 / 47802 > > Distribution: increasing_dupes(100000) > Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] > > move: 1 = 91808 / 91808 > moveAssign: 1 = 91808 / 91808 > compare: 0.573591 = 224558 / 391495 > iteratorDeref: 0.722385 = 540924 / 748803 > iteratorCopy: 0.298175 = 976735 / 3275708 > iteratorEquals: 0.522302 = 116383 / 222827 > iteratorInc: 0.715444 = 132750 / 185549 > iteratorDec: 1 = 91808 / 91808 > iteratorDiff: 0.217455 = 40956 / 188342 > iteratorAdd: 1 = 16383 / 16383 > iteratorAddAssign: 0.299523 = 81884 / 273381 > iteratorArithmetic: 0.49082 = 480164 / 978290 > micros: 0.449154 = 20675 / 46031 > > Distribution: mostly_increasing(100000) > Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] > > move: 0.681093 = 814725 / 1196203 > moveAssign: 0.703421 = 1809563 / 2572519 > compare: 0.924016 = 965320 / 1044700 > iteratorDeref: 0.847765 = 3802634 / 4485480 > iteratorCopy: 0.804068 = 11859111 / 14748894 > iteratorEquals: 1.59826 = 1388752 / 868913 > iteratorInc: 1.25659 = 1818421 / 1447111 > iteratorDec: 0.383738 = 537108 / 1399674 > iteratorDiff: 0.486488 = 562273 / 1155779 > iteratorSub: 0.0638659 = 6666 / 104375 > iteratorAdd: 0.184194 = 55586 / 301780 > iteratorAddAssign: 1.30369 = 1105439 / 847932 > iteratorArithmetic: 0.893672 = 5474245 / 6125564 > micros: 0.790603 = 233953 / 295917 > > Distribution: decreasing_unique(100000) > Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] > > move: 0.40597 = 995312 / 2451688 > moveAssign: 0.45785 = 2459840 / 5372592 > compare: 0.489247 = 151631 / 309927 > iteratorDeref: 0.524798 = 3415934 / 6509043 > iteratorCopy: 0.563628 = 5971275 / 10594360 > iteratorEquals: 1.18395 = 843533 / 712475 > iteratorInc: 0.864454 = 1460463 / 1689462 > iteratorDec: 0.373125 = 1606976 / 4306800 > iteratorDiff: 0.495226 = 235930 / 476409 > iteratorSub: 0.248254 = 13824 / 55685 > iteratorAdd: 0.590877 = 135839 / 229894 > iteratorAddAssign: 0.421852 = 119646 / 283621 > iteratorArithmetic: 0.569514 = 4416211 / 7754346 > micros: 0.543145 = 167865 / 309061 > > Distribution: decreasing_dupes(100000) > Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] > > move: 0.741695 = 1455276 / 1962096 > moveAssign: 0.736048 = 2826598 / 3840238 > compare: 0.641679 = 282851 / 440798 > iteratorDeref: 0.740565 = 3386062 / 4572271 > iteratorCopy: 0.628798 = 5008472 / 7965150 > iteratorEquals: 1.50206 = 794354 / 528843 > iteratorInc: 1.39602 = 2164546 / 1550510 > iteratorDec: 0.336639 = 837728 / 2488504 > iteratorDiff: 0.313457 = 88552 / 282501 > iteratorSub: 0.165388 = 3880 / 23460 > iteratorAdd: 0.377644 = 27139 / 71864 > iteratorAddAssign: 0.551364 = 182383 / 330785 > iteratorArithmetic: 0.776766 = 4098582 / 5276467 > micros: 0.71282 = 160398 / 225019 > > Distribution: mostly_decreasing(100000) > Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, > 6, 4, 3, 2] > > move: 0.760602 = 2197435 / 2889075 > moveAssign: 0.768473 = 4591313 / 5974593 > compare: 0.932913 = 977678 / 1047984 > iteratorDeref: 0.838641 = 6633169 / 7909424 > iteratorCopy: 0.829028 = 15447960 / 18633830 > iteratorEquals: 1.7009 = 2000686 / 1176251 > iteratorInc: 1.52771 = 3788203 / 2479657 > iteratorDec: 0.363843 = 1369990 / 3765332 > iteratorDiff: 0.489186 = 595472 / 1217270 > iteratorSub: 0.0877827 = 10925 / 124455 > iteratorAdd: 0.201991 = 69640 / 344767 > iteratorAddAssign: 1.32756 = 1133512 / 853830 > iteratorArithmetic: 0.900303 = 8968428 / 9961562 > micros: 0.846298 = 377504 / 446065 > > Distribution: sawtooth_unique(100000) > Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] > > move: 0.569953 = 2912465 / 5110007 > moveAssign: 0.577701 = 6012427 / 10407511 > compare: 0.824488 = 1604387 / 1945920 > iteratorDeref: 0.678979 = 9299794 / 13696724 > iteratorCopy: 0.72546 = 26157352 / 36056250 > iteratorEquals: 2.00683 = 4131766 / 2058856 > iteratorInc: 1.10728 = 6498257 / 5868688 > iteratorDec: 0.111225 = 552519 / 4967557 > iteratorDiff: 0.405328 = 988144 / 2437889 > iteratorSub: 0.026249 = 6327 / 241038 > iteratorAdd: 0.089034 = 55918 / 628052 > iteratorAddAssign: 1.18737 = 2297233 / 1934717 > iteratorArithmetic: 0.801143 = 14530164 / 18136797 > micros: 0.706282 = 583937 / 826776 > > Distribution: sawtooth_dupes(100000) > Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] > > move: 0.547721 = 2348873 / 4288447 > moveAssign: 0.543252 = 4613817 / 8492965 > compare: 0.664642 = 841503 / 1266099 > iteratorDeref: 0.599656 = 6292282 / 10493159 > iteratorCopy: 0.60663 = 15336816 / 25281976 > iteratorEquals: 2.24803 = 2859760 / 1272118 > iteratorInc: 1.07142 = 4999546 / 4666276 > iteratorDec: 0.0490551 = 205953 / 4198405 > iteratorDiff: 0.314518 = 473157 / 1504390 > iteratorSub: 0.0371951 = 5998 / 161258 > iteratorAdd: 0.0875684 = 34672 / 395942 > iteratorAddAssign: 0.900009 = 1233077 / 1370072 > iteratorArithmetic: 0.72316 = 9812163 / 13568461 > micros: 0.610891 = 353102 / 578011 > > Distribution: increasing_with_swap(100000) > Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] > > move: 1 = 291782 / 291782 > moveAssign: 1 = 491779 / 491779 > compare: 0.57449 = 225064 / 391763 > iteratorDeref: 0.819553 = 941922 / 1149312 > iteratorCopy: 0.377964 = 1392386 / 3683915 > iteratorEquals: 0.525758 = 117408 / 223312 > iteratorInc: 0.863708 = 333050 / 385605 > iteratorDec: 0.999753 = 291716 / 291788 > iteratorDiff: 0.224408 = 42528 / 189512 > iteratorSub: 0.888889 = 104 / 117 > iteratorAdd: 0.997151 = 16801 / 16849 > iteratorAddAssign: 0.302604 = 82864 / 273836 > iteratorArithmetic: 0.640448 = 884471 / 1381019 > micros: 0.616124 = 36233 / 58808 > > Distribution: increasing_fast_slow(100000) > Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] > > move: 0.640348 = 825772 / 1289568 > moveAssign: 0.627081 = 1559792 / 2487384 > compare: 0.714235 = 431286 / 603843 > iteratorDeref: 0.694913 = 2421875 / 3485151 > iteratorCopy: 0.637176 = 5645877 / 8860786 > iteratorEquals: 1.7552 = 953880 / 543458 > iteratorInc: 1.08941 = 1676609 / 1539005 > iteratorDec: 0.135904 = 154533 / 1137073 > iteratorDiff: 0.388132 = 192787 / 496705 > iteratorSub: 0.00577964 = 141 / 24396 > iteratorAdd: 0.207287 = 17272 / 83324 > iteratorAddAssign: 0.849015 = 431320 / 508024 > iteratorArithmetic: 0.790987 = 3426542 / 4331985 > micros: 0.777699 = 153552 / 197444 > > > > #include <iostream> > #include <stdio.h> > #include <vector> > #include <cstdlib> > #include <algorithm> > #include <functional> > #include <ranges> > #include <chrono> > #include <cmath> > #include <numeric> > > using namespace std; > > #define LINE0 0 > #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > #define vout(v) #v << " = " << (v) > #define print1(v) cout_line << vout(v) << endl > #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl > #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) > << ", " << vout(v3) << endl > #define GET_MACRO(_1,_2,_3,NAME,...) NAME > #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) > > template <class Ch, class Tr, class Container> > basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > Container const& x); > > template <class X, class Y> > ostream & operator << (ostream & os, pair <X, Y> const& p) { > return os << p.first << ":" << p.second; > } > > template <class Ch, class Tr, class Container> > basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > Container const& x) { > os << "["; > bool first = true; > for(auto const& y : x) { > if (!first) os << ", "; > os << y; > first = false; > } > return os << "]"; > } > > > struct Counter { > int construct = 0; > int copy = 0; > int assign = 0; > int move = 0; > int moveAssign = 0; > int compare = 0; > int iteratorDeref = 0; > int iteratorCopy = 0; > int iteratorEquals = 0; > int iteratorInc = 0; > int iteratorDec = 0; > int iteratorDiff = 0; > int iteratorSub = 0; > int iteratorAdd = 0; > int iteratorAddAssign = 0; > int iteratorArithmetic = 0; > int micros = 0; > }; > > int next_element_id = 0; > > class Element { > Counter* counter; > public: > int val; > int id; // Distinguis identical values to verify stability. > Element(int val, Counter* counter) > : val(val), counter(counter), id(++next_element_id) { > ++counter->construct; > } > > Element(Element const& other) > : val(other.val), counter(other.counter), id(other.id) { > ++counter->copy; > } > > Element& operator=(Element const& other) { > val = other.val; > counter = other.counter; > id = other.id; > ++counter->assign; > return *this; > } > > Element(Element&& other) noexcept > : val(move(other.val)), counter(other.counter), id(other.id) { > ++counter->move; > } > > Element& operator=(Element&& other) noexcept { > val = move(other.val); > counter = other.counter; > id = other.id; > ++counter->moveAssign; > return *this; > } > > bool operator<(Element const& other) const { > ++counter->compare; > return val < other.val; > } > > bool operator==(Element const& other) const { > return val == other.val && id == other.id; > } > }; > > ostream & operator << (ostream & os, Element const& e) { > return os << e.val << "(" << e.id << ")"; > } > > > class RandomIterator { > typedef vector<Element>::iterator It; > It it; > Counter* counter; > > public: > using iterator_category = std::random_access_iterator_tag; > using difference_type = std::ptrdiff_t; > using value_type = Element; > using pointer = value_type*; > using reference = value_type&; > > RandomIterator(It it, Counter* counter) : it(it), counter(counter) { > } > > RandomIterator(RandomIterator const& other) : it(other.it), > counter(other.counter) { > counter->iteratorCopy += 1; > } > > bool operator== (RandomIterator other) { > counter->iteratorEquals += 1; > counter->iteratorArithmetic += 1; > return it == other.it; > } > > RandomIterator operator++ () { > counter->iteratorInc += 1; > counter->iteratorArithmetic += 1; > return RandomIterator(++it, counter); > } > > RandomIterator operator-- () { > counter->iteratorDec += 1; > counter->iteratorArithmetic += 1; > return RandomIterator(--it, counter); > } > > Element& operator* () { > counter->iteratorDeref += 1; > return *it; > } > > difference_type operator- (RandomIterator other) { > counter->iteratorDiff += 1; > counter->iteratorArithmetic += 1; > return it - other.it; > } > > RandomIterator operator- (difference_type diff) { > counter->iteratorSub += 1; > counter->iteratorArithmetic += 1; > return RandomIterator(it - diff, counter); > } > > RandomIterator operator+ (difference_type diff) { > counter->iteratorAdd += 1; > counter->iteratorArithmetic += 1; > return RandomIterator(it + diff, counter); > } > > RandomIterator operator+= (difference_type diff) { > counter->iteratorAddAssign += 1; > counter->iteratorArithmetic += 1; > return RandomIterator(it += diff, counter); > } > }; > > > class TestCase { > Counter counter; > vector<Element> elements; > > public: > TestCase(vector<int> const& data) { > elements.reserve(data.size()); > for (int i: data) { > elements.push_back(Element(i, &counter)); > } > } > > Counter sort( > void algo(RandomIterator, RandomIterator), > vector<Element>::iterator begin, > vector<Element>::iterator end) { > counter = Counter(); > auto t0 = std::chrono::system_clock::now(); > > algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); > > auto t1 = std::chrono::system_clock::now(); > counter.micros = > std::chrono::duration_cast<std::chrono::microseconds>(t1 - > t0).count(); > return counter; > } > > pair<Counter,Counter> benchmark( > void new_algo(RandomIterator, RandomIterator), > void cur_algo(RandomIterator, RandomIterator)) { > vector<Element> toSort1 = elements; > Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); > > vector<Element> toSort2 = elements; > Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); > > if (toSort1 != toSort2) { > cout << "INCORRECT RESULT!" << endl; > exit(1); > } > > return make_pair(counter1, counter2); > } > }; > > > void relative_metric(string name, int m1, int m2) { > if (m1 == 0 && m2 == 0) return; > if (m2 == 0) { > cout << name << ": inf = " << m1 << " / " << m2 << endl; > } else { > cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " > << m2 << endl; > } > } > > > void relative_report(Counter const& c1, Counter const& c2) { > cout << endl; > relative_metric("construct", c1.construct, c2.construct); > relative_metric("copy", c1.copy, c2.copy); > relative_metric("assign", c1.assign, c2.assign); > relative_metric("move", c1.move, c2.move); > relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > relative_metric("compare", c1.compare, c2.compare); > relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); > relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); > relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); > relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); > relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > c2.iteratorAddAssign); > relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > c2.iteratorArithmetic); > relative_metric("micros", c1.micros, c2.micros); > cout << endl; > } > > > /// This is a helper function for the merge routines. > template<typename _BidirectionalIterator, typename _Distance, typename _Compare> > void _my_merge_without_buffer( > _BidirectionalIterator __first, > _BidirectionalIterator __middle, > _BidirectionalIterator __last, > _Distance __len1, _Distance __len2, > _Compare __comp) { > > if (__len1 == 0 || __len2 == 0) > return; > > if (__len1 + __len2 == 2) { > if (__comp(__middle, __first)) > std::iter_swap(__first, __middle); > return; > } > > _BidirectionalIterator __first_cut = __first; > _BidirectionalIterator __second_cut = __middle; > _Distance __len11 = 0; > _Distance __len22 = 0; > > > #define NEW_ALGO 1 > > #ifdef NEW_ALGO > // Prevent stack overflow > if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > #endif > if (__len1 > __len2) > { > __len11 = __len1 / 2; > std::advance(__first_cut, __len11); > __second_cut > = std::__lower_bound(__middle, __last, *__first_cut, > __gnu_cxx::__ops::__iter_comp_val(__comp)); > __len22 = std::distance(__middle, __second_cut); > } > else > { > __len22 = __len2 / 2; > std::advance(__second_cut, __len22); > __first_cut > = std::__upper_bound(__first, __middle, *__second_cut, > __gnu_cxx::__ops::__val_comp_iter(__comp)); > __len11 = std::distance(__first, __first_cut); > } > #ifdef NEW_ALGO > } else > { > _Distance __cut_range; > // __cut_len is always equal to __middle - __first_cut (and eventually to > // __second_cut - __middle) but we keep track of it to avoid calculating it > // later which can be expensive for bidirectional iterators. > _Distance __cut_len; > > if (__len1 > __len2) > { > std::advance(__first_cut, __len1 - __len2); > __cut_range = __len2; > } > else > { > __cut_range = __len1; > } > __cut_len = __cut_range; > > while (__cut_range > 0) { > _Distance __half_cut_range = __cut_range / 2; > _BidirectionalIterator __mid_first_cut = __first_cut; > std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > _BidirectionalIterator __mid_second_cut = __second_cut; > std::advance(__mid_second_cut, __half_cut_range); > > if (__comp(__mid_second_cut, __mid_first_cut)) { > __cut_range -= __half_cut_range + 1; > __second_cut = __mid_second_cut; > ++__second_cut; > } else { > __cut_len -= __cut_range - __half_cut_range; > __cut_range = __half_cut_range; > __first_cut = __mid_first_cut; > ++__first_cut; > } > } > __len22 = __cut_len; > __len11 = __len1 - __cut_len; > } > #endif > > _BidirectionalIterator __new_middle > = std::rotate(__first_cut, __middle, __second_cut); > _my_merge_without_buffer(__first, __first_cut, __new_middle, > __len11, __len22, __comp); > _my_merge_without_buffer(__new_middle, __second_cut, __last, > __len1 - __len11, __len2 - __len22, __comp); > } > > /// This is a helper function for the stable sorting routines. > template<typename _RandomAccessIterator, typename _Compare> > void > _my_inplace_stable_sort(_RandomAccessIterator __first, > _RandomAccessIterator __last, _Compare __comp) > { > > if (__last - __first < 15) > { > std::__insertion_sort(__first, __last, __comp); > return; > } > _RandomAccessIterator __middle = __first + (__last - __first) / 2; > _my_inplace_stable_sort(__first, __middle, __comp); > _my_inplace_stable_sort(__middle, __last, __comp); > _my_merge_without_buffer(__first, __middle, __last, > __middle - __first, > __last - __middle, > __comp); > } > > > template<typename Iterator> > void my_inplace_stable_sort(Iterator start, Iterator end) { > _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > } > > template<typename Iterator> > void std_inplace_stable_sort(Iterator start, Iterator end) { > std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > } > > void benchmark(string const& dist_name, > vector<int> dist_func(int), > int n) { > vector<int> v = dist_func(n); > cout << "Distribution: " << dist_name << "(" << n << ")" << endl; > cout << "Example: " << dist_func(20) << endl; > TestCase testCase(v); > > pair<Counter, Counter> counter_pair = > testCase.benchmark( > my_inplace_stable_sort > , std_inplace_stable_sort > ); > Counter cKloet = counter_pair.first; > Counter cInplace = counter_pair.second; > > relative_report(cKloet, cInplace); > } > > > vector<int> random_unique(int n) { > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[i] = i; > } > for (int i = n - 1; i >= 0; --i) { > swap(v[i], v[rand() % (i + 1)]); > } > return v; > } > > vector<int> random_dupes(int n) { > vector<int> v(n); > int r = (int)sqrt(n); > for (int i = 0; i < n; ++i) { > v[i] = i / r; > } > for (int i = n - 1; i >= 0; --i) { > swap(v[i], v[rand() % (i + 1)]); > } > return v; > } > > vector<int> increasing_unique(int n) { > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[i] = i; > } > return v; > } > > > vector<int> increasing_dupes(int n) { > vector<int> v(n); > int r = (int)sqrt(n); > for (int i = 0; i < n; ++i) { > v[i] = i / r; > } > return v; > } > > > vector<int> mostly_increasing(int n) { > vector<int> v(n); > int r = (int)sqrt(n); > for (int i = 0; i < n; ++i) { > v[i] = i + rand() % r; > } > return v; > } > > > vector<int> decreasing_unique(int n) { > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[i] = n - 1 - i; > } > return v; > } > > vector<int> decreasing_dupes(int n) { > vector<int> v(n); > int r = (int)sqrt(n); > for (int i = 0; i < n; ++i) { > v[i] = (n - 1 - i) / r; > } > return v; > } > > vector<int> mostly_decreasing(int n) { > vector<int> v(n); > int r = (int)sqrt(n); > for (int i = 0; i < n; ++i) { > v[i] = n - 1 - i + rand() % r; > } > return v; > } > > vector<int> sawtooth_unique(int n) { > int r = (int)sqrt(n); > while (gcd(r, n) != 1) ++r; > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[(i * r) % n] = i; > } > return v; > } > > vector<int> sawtooth_dupes(int n) { > int r = (int)sqrt(n); > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[i] = i % r; > } > return v; > } > > vector<int> increasing_with_swap(int n) { > vector<int> v(n); > for (int i = 0; i < n; ++i) { > v[i] = i; > } > swap(v[0], v[n - 1]); > return v; > } > > vector<int> increasing_fast_slow(int n) { > int r = (int)sqrt(n); > vector<int> v(n); > for (int i = 0; i < r; ++i) { > v[i] = i * r; > } > for (int i = r; i < n; ++i) { > v[i] = (i - r) * n / (n - r); > } > return v; > } > > > #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > > int main() { > int n = 100000; > BENCHMARK(random_unique, n); > BENCHMARK(random_dupes, n); > BENCHMARK(increasing_unique, n); > BENCHMARK(increasing_dupes, n); > BENCHMARK(mostly_increasing, n); > BENCHMARK(decreasing_unique, n); > BENCHMARK(decreasing_dupes, n); > BENCHMARK(mostly_decreasing, n); > BENCHMARK(sawtooth_unique, n); > BENCHMARK(sawtooth_dupes, n); > BENCHMARK(increasing_with_swap, n); > BENCHMARK(increasing_fast_slow, n); > > return 0; > } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: __inplace_stable_sort improvement 2021-11-24 18:33 ` François Dumont @ 2021-11-24 18:48 ` David de Kloet 2021-11-25 5:35 ` François Dumont 0 siblings, 1 reply; 6+ messages in thread From: David de Kloet @ 2021-11-24 18:48 UTC (permalink / raw) To: François Dumont; +Cc: libstdc++ On Wed, Nov 24, 2021 at 7:33 PM François Dumont <frs.dumont@gmail.com> wrote: > > > On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > > Hello libstdc++, > > > > Apologies if this is the wrong list for my message. I believe I have an > > improvement for __inplace_stable_sort and I'm wondering if this is worth > > pursuing and if so how to proceed. > > Of course, anyone interested in enhancing the library is welcome. > > The best way to do so would be to provide a patch on the current library > code. I'm the last one to have touch to this algo and it would be easier > for me to understand if you try to implement it directly on the library. _my_merge_without_buffer below preprocesses exactly to __merge_without_buffer from the library if you undefine NEW_ALGO so you can see exactly what my changes are. I currently don't have much time, but when I do I'd be happy to provide it as a patch. If you find time before me, you're also more than welcome to use the code. > The good news is that you are trying to enhance a piece of code rarely > used as it is only when the buffer allocation fails. So it is less > critical and so perhaps more open to modifications. > > During my last work on it I've added a performance test: > performance/25_algorithms/inplace_merge.cc. It is also supposed to check > performance when we fail to get a buffer and so use > __merge_without_buffer. Use 'make check-performance' to run it, see > library manual guide. Do not hesitate to change it if you think that > your benches are better. I don't currently have the environment to `make check-performance`. I just copied the code from the web and adjusted it. But (again) when I have time, I'll look into that. If you know of any "contributing to stdlibc++" guides, that would be great too. > > > __inplace_stable_sort does a merge sort where in each recursive step the 2 > > sublists are merged in-place. The merge step is itself also recursive. In order > > to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 > > of [B, C) are found such that all elements in S1 are greater than all elements > > in S2 and then those sections are swapped/rotated and the shorter sections are > > merged recursively. This is done by the function __merge_without_buffer. > > > > In the current implementation, __merge_without_buffer takes the half point of > > the longer section and does a binary search (via lower_bound or upper_bound) to > > find the corresponding cut point on the other section. Then it rotates the > > sections. > > > > My proposal is to find the cut points such that both rotated sections have the > > same length such that they can always simply be swapped. > > What do you mean by swapped ? I mean the first element of section 1 gets swapped with the first element of section 2. The second element of section 1 gets swapped with the second element of section 2. Etc. for each index. > > I see in the code below that you are just > changing how the cut positions are computed and then call the same > std::rotate algo. I cannot see any reason why changing the position > would make the rotate faster. The implementation of rotate does a series of such section swaps. But if both section have the same length, it's only a single swap of sections which makes rotate about 1.5-2 times faster. The purpose of the code I added is to choose the cut positions such that the rotated sections are of equal length, which makes rotate faster by doing fewer element swaps, resulting in fewer calls to the copy/assignment constructor. > > I have implemented > > this and it seems to perform better in general and in the special cases that > > I've tried. > > > > Unfortunately, since we are no longer guaranteed to halve the longer section in > > each step, there is a possibility for a stack overflow. To avoid this, the new > > algorithm falls back to the old behavior whenever the sections have > > significantly different lengths. > > > > Also, where the current algorithm performs the binary search implicitly via > > lower_bound/upper_bound, the new algorithm has to implement the binary search > > explicitly because it has to look at both sections to determine the next step > > in the search. > > > > I have created wrappers for my value objects as well as my iterators to count > > the number of operations performed by the current and my algorithm and run > > benchmarks on various distributions. My code is at the bottom but below is the > > output as run on my machine. For each operation it outputs the ratio between > > the number of operations performed by my algorithm over the number performed by > > the current algorithm. > > > > Thanks for your attention, > > David > > > > > > Distribution: random_unique(100000) > > Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] > > > > move: 0.588026 = 3331848 / 5666154 > > moveAssign: 0.594789 = 6852819 / 11521431 > > compare: 0.914997 = 2028977 / 2217469 > > iteratorDeref: 0.718466 = 11001863 / 15312994 > > iteratorCopy: 0.791332 = 33977189 / 42936712 > > iteratorEquals: 2.27841 = 5174606 / 2271143 > > iteratorInc: 1.2143 = 7697971 / 6339449 > > iteratorDec: 0.0833529 = 473556 / 5681340 > > iteratorDiff: 0.444865 = 1362598 / 3062950 > > iteratorSub: 0.0200683 = 6272 / 312533 > > iteratorAdd: 0.0695467 = 57149 / 821736 > > iteratorAddAssign: 1.36491 = 3141949 / 2301950 > > iteratorArithmetic: 0.861623 = 17914101 / 20791101 > > micros: 0.663704 = 650825 / 980596 > > > > Distribution: random_dupes(100000) > > Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] > > > > move: 0.570041 = 2949623 / 5174401 > > moveAssign: 0.577736 = 6087819 / 10537375 > > compare: 0.845419 = 1530877 / 1810790 > > iteratorDeref: 0.678233 = 9232869 / 13613114 > > iteratorCopy: 0.742699 = 25652244 / 34539225 > > iteratorEquals: 2.41643 = 4123881 / 1706601 > > iteratorInc: 1.15617 = 6554374 / 5669028 > > iteratorDec: 0.101004 = 539991 / 5346219 > > iteratorDiff: 0.417093 = 959972 / 2301580 > > iteratorSub: 0.0350046 = 8882 / 253738 > > iteratorAdd: 0.0972571 = 64122 / 659304 > > iteratorAddAssign: 1.2279 = 2247197 / 1830107 > > iteratorArithmetic: 0.81605 = 14498419 / 17766577 > > micros: 0.699386 = 554426 / 792732 > > > > Distribution: increasing_unique(100000) > > Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] > > > > move: 1 = 91808 / 91808 > > moveAssign: 1 = 91808 / 91808 > > compare: 0.573591 = 224558 / 391495 > > iteratorDeref: 0.722385 = 540924 / 748803 > > iteratorCopy: 0.298175 = 976735 / 3275708 > > iteratorEquals: 0.522302 = 116383 / 222827 > > iteratorInc: 0.715444 = 132750 / 185549 > > iteratorDec: 1 = 91808 / 91808 > > iteratorDiff: 0.217455 = 40956 / 188342 > > iteratorAdd: 1 = 16383 / 16383 > > iteratorAddAssign: 0.299523 = 81884 / 273381 > > iteratorArithmetic: 0.49082 = 480164 / 978290 > > micros: 0.479792 = 22935 / 47802 > > > > Distribution: increasing_dupes(100000) > > Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] > > > > move: 1 = 91808 / 91808 > > moveAssign: 1 = 91808 / 91808 > > compare: 0.573591 = 224558 / 391495 > > iteratorDeref: 0.722385 = 540924 / 748803 > > iteratorCopy: 0.298175 = 976735 / 3275708 > > iteratorEquals: 0.522302 = 116383 / 222827 > > iteratorInc: 0.715444 = 132750 / 185549 > > iteratorDec: 1 = 91808 / 91808 > > iteratorDiff: 0.217455 = 40956 / 188342 > > iteratorAdd: 1 = 16383 / 16383 > > iteratorAddAssign: 0.299523 = 81884 / 273381 > > iteratorArithmetic: 0.49082 = 480164 / 978290 > > micros: 0.449154 = 20675 / 46031 > > > > Distribution: mostly_increasing(100000) > > Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] > > > > move: 0.681093 = 814725 / 1196203 > > moveAssign: 0.703421 = 1809563 / 2572519 > > compare: 0.924016 = 965320 / 1044700 > > iteratorDeref: 0.847765 = 3802634 / 4485480 > > iteratorCopy: 0.804068 = 11859111 / 14748894 > > iteratorEquals: 1.59826 = 1388752 / 868913 > > iteratorInc: 1.25659 = 1818421 / 1447111 > > iteratorDec: 0.383738 = 537108 / 1399674 > > iteratorDiff: 0.486488 = 562273 / 1155779 > > iteratorSub: 0.0638659 = 6666 / 104375 > > iteratorAdd: 0.184194 = 55586 / 301780 > > iteratorAddAssign: 1.30369 = 1105439 / 847932 > > iteratorArithmetic: 0.893672 = 5474245 / 6125564 > > micros: 0.790603 = 233953 / 295917 > > > > Distribution: decreasing_unique(100000) > > Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] > > > > move: 0.40597 = 995312 / 2451688 > > moveAssign: 0.45785 = 2459840 / 5372592 > > compare: 0.489247 = 151631 / 309927 > > iteratorDeref: 0.524798 = 3415934 / 6509043 > > iteratorCopy: 0.563628 = 5971275 / 10594360 > > iteratorEquals: 1.18395 = 843533 / 712475 > > iteratorInc: 0.864454 = 1460463 / 1689462 > > iteratorDec: 0.373125 = 1606976 / 4306800 > > iteratorDiff: 0.495226 = 235930 / 476409 > > iteratorSub: 0.248254 = 13824 / 55685 > > iteratorAdd: 0.590877 = 135839 / 229894 > > iteratorAddAssign: 0.421852 = 119646 / 283621 > > iteratorArithmetic: 0.569514 = 4416211 / 7754346 > > micros: 0.543145 = 167865 / 309061 > > > > Distribution: decreasing_dupes(100000) > > Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] > > > > move: 0.741695 = 1455276 / 1962096 > > moveAssign: 0.736048 = 2826598 / 3840238 > > compare: 0.641679 = 282851 / 440798 > > iteratorDeref: 0.740565 = 3386062 / 4572271 > > iteratorCopy: 0.628798 = 5008472 / 7965150 > > iteratorEquals: 1.50206 = 794354 / 528843 > > iteratorInc: 1.39602 = 2164546 / 1550510 > > iteratorDec: 0.336639 = 837728 / 2488504 > > iteratorDiff: 0.313457 = 88552 / 282501 > > iteratorSub: 0.165388 = 3880 / 23460 > > iteratorAdd: 0.377644 = 27139 / 71864 > > iteratorAddAssign: 0.551364 = 182383 / 330785 > > iteratorArithmetic: 0.776766 = 4098582 / 5276467 > > micros: 0.71282 = 160398 / 225019 > > > > Distribution: mostly_decreasing(100000) > > Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, > > 6, 4, 3, 2] > > > > move: 0.760602 = 2197435 / 2889075 > > moveAssign: 0.768473 = 4591313 / 5974593 > > compare: 0.932913 = 977678 / 1047984 > > iteratorDeref: 0.838641 = 6633169 / 7909424 > > iteratorCopy: 0.829028 = 15447960 / 18633830 > > iteratorEquals: 1.7009 = 2000686 / 1176251 > > iteratorInc: 1.52771 = 3788203 / 2479657 > > iteratorDec: 0.363843 = 1369990 / 3765332 > > iteratorDiff: 0.489186 = 595472 / 1217270 > > iteratorSub: 0.0877827 = 10925 / 124455 > > iteratorAdd: 0.201991 = 69640 / 344767 > > iteratorAddAssign: 1.32756 = 1133512 / 853830 > > iteratorArithmetic: 0.900303 = 8968428 / 9961562 > > micros: 0.846298 = 377504 / 446065 > > > > Distribution: sawtooth_unique(100000) > > Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] > > > > move: 0.569953 = 2912465 / 5110007 > > moveAssign: 0.577701 = 6012427 / 10407511 > > compare: 0.824488 = 1604387 / 1945920 > > iteratorDeref: 0.678979 = 9299794 / 13696724 > > iteratorCopy: 0.72546 = 26157352 / 36056250 > > iteratorEquals: 2.00683 = 4131766 / 2058856 > > iteratorInc: 1.10728 = 6498257 / 5868688 > > iteratorDec: 0.111225 = 552519 / 4967557 > > iteratorDiff: 0.405328 = 988144 / 2437889 > > iteratorSub: 0.026249 = 6327 / 241038 > > iteratorAdd: 0.089034 = 55918 / 628052 > > iteratorAddAssign: 1.18737 = 2297233 / 1934717 > > iteratorArithmetic: 0.801143 = 14530164 / 18136797 > > micros: 0.706282 = 583937 / 826776 > > > > Distribution: sawtooth_dupes(100000) > > Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] > > > > move: 0.547721 = 2348873 / 4288447 > > moveAssign: 0.543252 = 4613817 / 8492965 > > compare: 0.664642 = 841503 / 1266099 > > iteratorDeref: 0.599656 = 6292282 / 10493159 > > iteratorCopy: 0.60663 = 15336816 / 25281976 > > iteratorEquals: 2.24803 = 2859760 / 1272118 > > iteratorInc: 1.07142 = 4999546 / 4666276 > > iteratorDec: 0.0490551 = 205953 / 4198405 > > iteratorDiff: 0.314518 = 473157 / 1504390 > > iteratorSub: 0.0371951 = 5998 / 161258 > > iteratorAdd: 0.0875684 = 34672 / 395942 > > iteratorAddAssign: 0.900009 = 1233077 / 1370072 > > iteratorArithmetic: 0.72316 = 9812163 / 13568461 > > micros: 0.610891 = 353102 / 578011 > > > > Distribution: increasing_with_swap(100000) > > Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] > > > > move: 1 = 291782 / 291782 > > moveAssign: 1 = 491779 / 491779 > > compare: 0.57449 = 225064 / 391763 > > iteratorDeref: 0.819553 = 941922 / 1149312 > > iteratorCopy: 0.377964 = 1392386 / 3683915 > > iteratorEquals: 0.525758 = 117408 / 223312 > > iteratorInc: 0.863708 = 333050 / 385605 > > iteratorDec: 0.999753 = 291716 / 291788 > > iteratorDiff: 0.224408 = 42528 / 189512 > > iteratorSub: 0.888889 = 104 / 117 > > iteratorAdd: 0.997151 = 16801 / 16849 > > iteratorAddAssign: 0.302604 = 82864 / 273836 > > iteratorArithmetic: 0.640448 = 884471 / 1381019 > > micros: 0.616124 = 36233 / 58808 > > > > Distribution: increasing_fast_slow(100000) > > Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] > > > > move: 0.640348 = 825772 / 1289568 > > moveAssign: 0.627081 = 1559792 / 2487384 > > compare: 0.714235 = 431286 / 603843 > > iteratorDeref: 0.694913 = 2421875 / 3485151 > > iteratorCopy: 0.637176 = 5645877 / 8860786 > > iteratorEquals: 1.7552 = 953880 / 543458 > > iteratorInc: 1.08941 = 1676609 / 1539005 > > iteratorDec: 0.135904 = 154533 / 1137073 > > iteratorDiff: 0.388132 = 192787 / 496705 > > iteratorSub: 0.00577964 = 141 / 24396 > > iteratorAdd: 0.207287 = 17272 / 83324 > > iteratorAddAssign: 0.849015 = 431320 / 508024 > > iteratorArithmetic: 0.790987 = 3426542 / 4331985 > > micros: 0.777699 = 153552 / 197444 > > > > > > > > #include <iostream> > > #include <stdio.h> > > #include <vector> > > #include <cstdlib> > > #include <algorithm> > > #include <functional> > > #include <ranges> > > #include <chrono> > > #include <cmath> > > #include <numeric> > > > > using namespace std; > > > > #define LINE0 0 > > #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > > #define vout(v) #v << " = " << (v) > > #define print1(v) cout_line << vout(v) << endl > > #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl > > #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) > > << ", " << vout(v3) << endl > > #define GET_MACRO(_1,_2,_3,NAME,...) NAME > > #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) > > > > template <class Ch, class Tr, class Container> > > basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > > Container const& x); > > > > template <class X, class Y> > > ostream & operator << (ostream & os, pair <X, Y> const& p) { > > return os << p.first << ":" << p.second; > > } > > > > template <class Ch, class Tr, class Container> > > basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > > Container const& x) { > > os << "["; > > bool first = true; > > for(auto const& y : x) { > > if (!first) os << ", "; > > os << y; > > first = false; > > } > > return os << "]"; > > } > > > > > > struct Counter { > > int construct = 0; > > int copy = 0; > > int assign = 0; > > int move = 0; > > int moveAssign = 0; > > int compare = 0; > > int iteratorDeref = 0; > > int iteratorCopy = 0; > > int iteratorEquals = 0; > > int iteratorInc = 0; > > int iteratorDec = 0; > > int iteratorDiff = 0; > > int iteratorSub = 0; > > int iteratorAdd = 0; > > int iteratorAddAssign = 0; > > int iteratorArithmetic = 0; > > int micros = 0; > > }; > > > > int next_element_id = 0; > > > > class Element { > > Counter* counter; > > public: > > int val; > > int id; // Distinguis identical values to verify stability. > > Element(int val, Counter* counter) > > : val(val), counter(counter), id(++next_element_id) { > > ++counter->construct; > > } > > > > Element(Element const& other) > > : val(other.val), counter(other.counter), id(other.id) { > > ++counter->copy; > > } > > > > Element& operator=(Element const& other) { > > val = other.val; > > counter = other.counter; > > id = other.id; > > ++counter->assign; > > return *this; > > } > > > > Element(Element&& other) noexcept > > : val(move(other.val)), counter(other.counter), id(other.id) { > > ++counter->move; > > } > > > > Element& operator=(Element&& other) noexcept { > > val = move(other.val); > > counter = other.counter; > > id = other.id; > > ++counter->moveAssign; > > return *this; > > } > > > > bool operator<(Element const& other) const { > > ++counter->compare; > > return val < other.val; > > } > > > > bool operator==(Element const& other) const { > > return val == other.val && id == other.id; > > } > > }; > > > > ostream & operator << (ostream & os, Element const& e) { > > return os << e.val << "(" << e.id << ")"; > > } > > > > > > class RandomIterator { > > typedef vector<Element>::iterator It; > > It it; > > Counter* counter; > > > > public: > > using iterator_category = std::random_access_iterator_tag; > > using difference_type = std::ptrdiff_t; > > using value_type = Element; > > using pointer = value_type*; > > using reference = value_type&; > > > > RandomIterator(It it, Counter* counter) : it(it), counter(counter) { > > } > > > > RandomIterator(RandomIterator const& other) : it(other.it), > > counter(other.counter) { > > counter->iteratorCopy += 1; > > } > > > > bool operator== (RandomIterator other) { > > counter->iteratorEquals += 1; > > counter->iteratorArithmetic += 1; > > return it == other.it; > > } > > > > RandomIterator operator++ () { > > counter->iteratorInc += 1; > > counter->iteratorArithmetic += 1; > > return RandomIterator(++it, counter); > > } > > > > RandomIterator operator-- () { > > counter->iteratorDec += 1; > > counter->iteratorArithmetic += 1; > > return RandomIterator(--it, counter); > > } > > > > Element& operator* () { > > counter->iteratorDeref += 1; > > return *it; > > } > > > > difference_type operator- (RandomIterator other) { > > counter->iteratorDiff += 1; > > counter->iteratorArithmetic += 1; > > return it - other.it; > > } > > > > RandomIterator operator- (difference_type diff) { > > counter->iteratorSub += 1; > > counter->iteratorArithmetic += 1; > > return RandomIterator(it - diff, counter); > > } > > > > RandomIterator operator+ (difference_type diff) { > > counter->iteratorAdd += 1; > > counter->iteratorArithmetic += 1; > > return RandomIterator(it + diff, counter); > > } > > > > RandomIterator operator+= (difference_type diff) { > > counter->iteratorAddAssign += 1; > > counter->iteratorArithmetic += 1; > > return RandomIterator(it += diff, counter); > > } > > }; > > > > > > class TestCase { > > Counter counter; > > vector<Element> elements; > > > > public: > > TestCase(vector<int> const& data) { > > elements.reserve(data.size()); > > for (int i: data) { > > elements.push_back(Element(i, &counter)); > > } > > } > > > > Counter sort( > > void algo(RandomIterator, RandomIterator), > > vector<Element>::iterator begin, > > vector<Element>::iterator end) { > > counter = Counter(); > > auto t0 = std::chrono::system_clock::now(); > > > > algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); > > > > auto t1 = std::chrono::system_clock::now(); > > counter.micros = > > std::chrono::duration_cast<std::chrono::microseconds>(t1 - > > t0).count(); > > return counter; > > } > > > > pair<Counter,Counter> benchmark( > > void new_algo(RandomIterator, RandomIterator), > > void cur_algo(RandomIterator, RandomIterator)) { > > vector<Element> toSort1 = elements; > > Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); > > > > vector<Element> toSort2 = elements; > > Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); > > > > if (toSort1 != toSort2) { > > cout << "INCORRECT RESULT!" << endl; > > exit(1); > > } > > > > return make_pair(counter1, counter2); > > } > > }; > > > > > > void relative_metric(string name, int m1, int m2) { > > if (m1 == 0 && m2 == 0) return; > > if (m2 == 0) { > > cout << name << ": inf = " << m1 << " / " << m2 << endl; > > } else { > > cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " > > << m2 << endl; > > } > > } > > > > > > void relative_report(Counter const& c1, Counter const& c2) { > > cout << endl; > > relative_metric("construct", c1.construct, c2.construct); > > relative_metric("copy", c1.copy, c2.copy); > > relative_metric("assign", c1.assign, c2.assign); > > relative_metric("move", c1.move, c2.move); > > relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > > relative_metric("compare", c1.compare, c2.compare); > > relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); > > relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); > > relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); > > relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > > relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > > relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); > > relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > > relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > > relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > > c2.iteratorAddAssign); > > relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > > c2.iteratorArithmetic); > > relative_metric("micros", c1.micros, c2.micros); > > cout << endl; > > } > > > > > > /// This is a helper function for the merge routines. > > template<typename _BidirectionalIterator, typename _Distance, typename _Compare> > > void _my_merge_without_buffer( > > _BidirectionalIterator __first, > > _BidirectionalIterator __middle, > > _BidirectionalIterator __last, > > _Distance __len1, _Distance __len2, > > _Compare __comp) { > > > > if (__len1 == 0 || __len2 == 0) > > return; > > > > if (__len1 + __len2 == 2) { > > if (__comp(__middle, __first)) > > std::iter_swap(__first, __middle); > > return; > > } > > > > _BidirectionalIterator __first_cut = __first; > > _BidirectionalIterator __second_cut = __middle; > > _Distance __len11 = 0; > > _Distance __len22 = 0; > > > > > > #define NEW_ALGO 1 > > > > #ifdef NEW_ALGO > > // Prevent stack overflow > > if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > > #endif > > if (__len1 > __len2) > > { > > __len11 = __len1 / 2; > > std::advance(__first_cut, __len11); > > __second_cut > > = std::__lower_bound(__middle, __last, *__first_cut, > > __gnu_cxx::__ops::__iter_comp_val(__comp)); > > __len22 = std::distance(__middle, __second_cut); > > } > > else > > { > > __len22 = __len2 / 2; > > std::advance(__second_cut, __len22); > > __first_cut > > = std::__upper_bound(__first, __middle, *__second_cut, > > __gnu_cxx::__ops::__val_comp_iter(__comp)); > > __len11 = std::distance(__first, __first_cut); > > } > > #ifdef NEW_ALGO > > } else > > { > > _Distance __cut_range; > > // __cut_len is always equal to __middle - __first_cut (and eventually to > > // __second_cut - __middle) but we keep track of it to avoid calculating it > > // later which can be expensive for bidirectional iterators. > > _Distance __cut_len; > > > > if (__len1 > __len2) > > { > > std::advance(__first_cut, __len1 - __len2); > > __cut_range = __len2; > > } > > else > > { > > __cut_range = __len1; > > } > > __cut_len = __cut_range; > > > > while (__cut_range > 0) { > > _Distance __half_cut_range = __cut_range / 2; > > _BidirectionalIterator __mid_first_cut = __first_cut; > > std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > > _BidirectionalIterator __mid_second_cut = __second_cut; > > std::advance(__mid_second_cut, __half_cut_range); > > > > if (__comp(__mid_second_cut, __mid_first_cut)) { > > __cut_range -= __half_cut_range + 1; > > __second_cut = __mid_second_cut; > > ++__second_cut; > > } else { > > __cut_len -= __cut_range - __half_cut_range; > > __cut_range = __half_cut_range; > > __first_cut = __mid_first_cut; > > ++__first_cut; > > } > > } > > __len22 = __cut_len; > > __len11 = __len1 - __cut_len; > > } > > #endif > > > > _BidirectionalIterator __new_middle > > = std::rotate(__first_cut, __middle, __second_cut); > > _my_merge_without_buffer(__first, __first_cut, __new_middle, > > __len11, __len22, __comp); > > _my_merge_without_buffer(__new_middle, __second_cut, __last, > > __len1 - __len11, __len2 - __len22, __comp); > > } > > > > /// This is a helper function for the stable sorting routines. > > template<typename _RandomAccessIterator, typename _Compare> > > void > > _my_inplace_stable_sort(_RandomAccessIterator __first, > > _RandomAccessIterator __last, _Compare __comp) > > { > > > > if (__last - __first < 15) > > { > > std::__insertion_sort(__first, __last, __comp); > > return; > > } > > _RandomAccessIterator __middle = __first + (__last - __first) / 2; > > _my_inplace_stable_sort(__first, __middle, __comp); > > _my_inplace_stable_sort(__middle, __last, __comp); > > _my_merge_without_buffer(__first, __middle, __last, > > __middle - __first, > > __last - __middle, > > __comp); > > } > > > > > > template<typename Iterator> > > void my_inplace_stable_sort(Iterator start, Iterator end) { > > _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > > } > > > > template<typename Iterator> > > void std_inplace_stable_sort(Iterator start, Iterator end) { > > std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > > } > > > > void benchmark(string const& dist_name, > > vector<int> dist_func(int), > > int n) { > > vector<int> v = dist_func(n); > > cout << "Distribution: " << dist_name << "(" << n << ")" << endl; > > cout << "Example: " << dist_func(20) << endl; > > TestCase testCase(v); > > > > pair<Counter, Counter> counter_pair = > > testCase.benchmark( > > my_inplace_stable_sort > > , std_inplace_stable_sort > > ); > > Counter cKloet = counter_pair.first; > > Counter cInplace = counter_pair.second; > > > > relative_report(cKloet, cInplace); > > } > > > > > > vector<int> random_unique(int n) { > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i; > > } > > for (int i = n - 1; i >= 0; --i) { > > swap(v[i], v[rand() % (i + 1)]); > > } > > return v; > > } > > > > vector<int> random_dupes(int n) { > > vector<int> v(n); > > int r = (int)sqrt(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i / r; > > } > > for (int i = n - 1; i >= 0; --i) { > > swap(v[i], v[rand() % (i + 1)]); > > } > > return v; > > } > > > > vector<int> increasing_unique(int n) { > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i; > > } > > return v; > > } > > > > > > vector<int> increasing_dupes(int n) { > > vector<int> v(n); > > int r = (int)sqrt(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i / r; > > } > > return v; > > } > > > > > > vector<int> mostly_increasing(int n) { > > vector<int> v(n); > > int r = (int)sqrt(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i + rand() % r; > > } > > return v; > > } > > > > > > vector<int> decreasing_unique(int n) { > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[i] = n - 1 - i; > > } > > return v; > > } > > > > vector<int> decreasing_dupes(int n) { > > vector<int> v(n); > > int r = (int)sqrt(n); > > for (int i = 0; i < n; ++i) { > > v[i] = (n - 1 - i) / r; > > } > > return v; > > } > > > > vector<int> mostly_decreasing(int n) { > > vector<int> v(n); > > int r = (int)sqrt(n); > > for (int i = 0; i < n; ++i) { > > v[i] = n - 1 - i + rand() % r; > > } > > return v; > > } > > > > vector<int> sawtooth_unique(int n) { > > int r = (int)sqrt(n); > > while (gcd(r, n) != 1) ++r; > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[(i * r) % n] = i; > > } > > return v; > > } > > > > vector<int> sawtooth_dupes(int n) { > > int r = (int)sqrt(n); > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i % r; > > } > > return v; > > } > > > > vector<int> increasing_with_swap(int n) { > > vector<int> v(n); > > for (int i = 0; i < n; ++i) { > > v[i] = i; > > } > > swap(v[0], v[n - 1]); > > return v; > > } > > > > vector<int> increasing_fast_slow(int n) { > > int r = (int)sqrt(n); > > vector<int> v(n); > > for (int i = 0; i < r; ++i) { > > v[i] = i * r; > > } > > for (int i = r; i < n; ++i) { > > v[i] = (i - r) * n / (n - r); > > } > > return v; > > } > > > > > > #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > > > > int main() { > > int n = 100000; > > BENCHMARK(random_unique, n); > > BENCHMARK(random_dupes, n); > > BENCHMARK(increasing_unique, n); > > BENCHMARK(increasing_dupes, n); > > BENCHMARK(mostly_increasing, n); > > BENCHMARK(decreasing_unique, n); > > BENCHMARK(decreasing_dupes, n); > > BENCHMARK(mostly_decreasing, n); > > BENCHMARK(sawtooth_unique, n); > > BENCHMARK(sawtooth_dupes, n); > > BENCHMARK(increasing_with_swap, n); > > BENCHMARK(increasing_fast_slow, n); > > > > return 0; > > } > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: __inplace_stable_sort improvement 2021-11-24 18:48 ` David de Kloet @ 2021-11-25 5:35 ` François Dumont 2021-11-25 19:51 ` David de Kloet 0 siblings, 1 reply; 6+ messages in thread From: François Dumont @ 2021-11-25 5:35 UTC (permalink / raw) To: David de Kloet; +Cc: libstdc++ On 24/11/21 7:48 pm, David de Kloet wrote: > On Wed, Nov 24, 2021 at 7:33 PM François Dumont <frs.dumont@gmail.com> wrote: >> >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: >>> Hello libstdc++, >>> >>> Apologies if this is the wrong list for my message. I believe I have an >>> improvement for __inplace_stable_sort and I'm wondering if this is worth >>> pursuing and if so how to proceed. >> Of course, anyone interested in enhancing the library is welcome. >> >> The best way to do so would be to provide a patch on the current library >> code. I'm the last one to have touch to this algo and it would be easier >> for me to understand if you try to implement it directly on the library. > _my_merge_without_buffer below preprocesses exactly to > __merge_without_buffer from the library if you undefine NEW_ALGO so > you can see exactly what my changes are. > > I currently don't have much time, but when I do I'd be happy to > provide it as a patch. If you find time before me, you're also more > than welcome to use the code. > >> The good news is that you are trying to enhance a piece of code rarely >> used as it is only when the buffer allocation fails. So it is less >> critical and so perhaps more open to modifications. >> >> During my last work on it I've added a performance test: >> performance/25_algorithms/inplace_merge.cc. It is also supposed to check >> performance when we fail to get a buffer and so use >> __merge_without_buffer. Use 'make check-performance' to run it, see >> library manual guide. Do not hesitate to change it if you think that >> your benches are better. > I don't currently have the environment to `make check-performance`. I > just copied the code from the web and adjusted it. But (again) when I > have time, I'll look into that. > > If you know of any "contributing to stdlibc++" guides, that would be great too. https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html > >>> __inplace_stable_sort does a merge sort where in each recursive step the 2 >>> sublists are merged in-place. The merge step is itself also recursive. In order >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 >>> of [B, C) are found such that all elements in S1 are greater than all elements >>> in S2 and then those sections are swapped/rotated and the shorter sections are >>> merged recursively. This is done by the function __merge_without_buffer. >>> >>> In the current implementation, __merge_without_buffer takes the half point of >>> the longer section and does a binary search (via lower_bound or upper_bound) to >>> find the corresponding cut point on the other section. Then it rotates the >>> sections. >>> >>> My proposal is to find the cut points such that both rotated sections have the >>> same length such that they can always simply be swapped. >> What do you mean by swapped ? > I mean the first element of section 1 gets swapped with the first > element of section 2. The second element of section 1 gets swapped > with the second element of section 2. Etc. for each index. > >> I see in the code below that you are just >> changing how the cut positions are computed and then call the same >> std::rotate algo. I cannot see any reason why changing the position >> would make the rotate faster. > The implementation of rotate does a series of such section swaps. But > if both section have the same length, it's only a single swap of > sections which makes rotate about 1.5-2 times faster. Ok, I see now, you try to use as much as possible the std::swap_ranges in __rotate for random access iterator. Maybe it means that this optimization should be limited to this type of iterator. We'll see depending on benches. I'll have a try then when I have time. Thanks > > The purpose of the code I added is to choose the cut positions such > that the rotated sections are of equal length, which makes rotate > faster by doing fewer element swaps, resulting in fewer calls to the > copy/assignment constructor. > >>> I have implemented >>> this and it seems to perform better in general and in the special cases that >>> I've tried. >>> >>> Unfortunately, since we are no longer guaranteed to halve the longer section in >>> each step, there is a possibility for a stack overflow. To avoid this, the new >>> algorithm falls back to the old behavior whenever the sections have >>> significantly different lengths. >>> >>> Also, where the current algorithm performs the binary search implicitly via >>> lower_bound/upper_bound, the new algorithm has to implement the binary search >>> explicitly because it has to look at both sections to determine the next step >>> in the search. >>> >>> I have created wrappers for my value objects as well as my iterators to count >>> the number of operations performed by the current and my algorithm and run >>> benchmarks on various distributions. My code is at the bottom but below is the >>> output as run on my machine. For each operation it outputs the ratio between >>> the number of operations performed by my algorithm over the number performed by >>> the current algorithm. >>> >>> Thanks for your attention, >>> David >>> >>> >>> Distribution: random_unique(100000) >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] >>> >>> move: 0.588026 = 3331848 / 5666154 >>> moveAssign: 0.594789 = 6852819 / 11521431 >>> compare: 0.914997 = 2028977 / 2217469 >>> iteratorDeref: 0.718466 = 11001863 / 15312994 >>> iteratorCopy: 0.791332 = 33977189 / 42936712 >>> iteratorEquals: 2.27841 = 5174606 / 2271143 >>> iteratorInc: 1.2143 = 7697971 / 6339449 >>> iteratorDec: 0.0833529 = 473556 / 5681340 >>> iteratorDiff: 0.444865 = 1362598 / 3062950 >>> iteratorSub: 0.0200683 = 6272 / 312533 >>> iteratorAdd: 0.0695467 = 57149 / 821736 >>> iteratorAddAssign: 1.36491 = 3141949 / 2301950 >>> iteratorArithmetic: 0.861623 = 17914101 / 20791101 >>> micros: 0.663704 = 650825 / 980596 >>> >>> Distribution: random_dupes(100000) >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] >>> >>> move: 0.570041 = 2949623 / 5174401 >>> moveAssign: 0.577736 = 6087819 / 10537375 >>> compare: 0.845419 = 1530877 / 1810790 >>> iteratorDeref: 0.678233 = 9232869 / 13613114 >>> iteratorCopy: 0.742699 = 25652244 / 34539225 >>> iteratorEquals: 2.41643 = 4123881 / 1706601 >>> iteratorInc: 1.15617 = 6554374 / 5669028 >>> iteratorDec: 0.101004 = 539991 / 5346219 >>> iteratorDiff: 0.417093 = 959972 / 2301580 >>> iteratorSub: 0.0350046 = 8882 / 253738 >>> iteratorAdd: 0.0972571 = 64122 / 659304 >>> iteratorAddAssign: 1.2279 = 2247197 / 1830107 >>> iteratorArithmetic: 0.81605 = 14498419 / 17766577 >>> micros: 0.699386 = 554426 / 792732 >>> >>> Distribution: increasing_unique(100000) >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] >>> >>> move: 1 = 91808 / 91808 >>> moveAssign: 1 = 91808 / 91808 >>> compare: 0.573591 = 224558 / 391495 >>> iteratorDeref: 0.722385 = 540924 / 748803 >>> iteratorCopy: 0.298175 = 976735 / 3275708 >>> iteratorEquals: 0.522302 = 116383 / 222827 >>> iteratorInc: 0.715444 = 132750 / 185549 >>> iteratorDec: 1 = 91808 / 91808 >>> iteratorDiff: 0.217455 = 40956 / 188342 >>> iteratorAdd: 1 = 16383 / 16383 >>> iteratorAddAssign: 0.299523 = 81884 / 273381 >>> iteratorArithmetic: 0.49082 = 480164 / 978290 >>> micros: 0.479792 = 22935 / 47802 >>> >>> Distribution: increasing_dupes(100000) >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] >>> >>> move: 1 = 91808 / 91808 >>> moveAssign: 1 = 91808 / 91808 >>> compare: 0.573591 = 224558 / 391495 >>> iteratorDeref: 0.722385 = 540924 / 748803 >>> iteratorCopy: 0.298175 = 976735 / 3275708 >>> iteratorEquals: 0.522302 = 116383 / 222827 >>> iteratorInc: 0.715444 = 132750 / 185549 >>> iteratorDec: 1 = 91808 / 91808 >>> iteratorDiff: 0.217455 = 40956 / 188342 >>> iteratorAdd: 1 = 16383 / 16383 >>> iteratorAddAssign: 0.299523 = 81884 / 273381 >>> iteratorArithmetic: 0.49082 = 480164 / 978290 >>> micros: 0.449154 = 20675 / 46031 >>> >>> Distribution: mostly_increasing(100000) >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] >>> >>> move: 0.681093 = 814725 / 1196203 >>> moveAssign: 0.703421 = 1809563 / 2572519 >>> compare: 0.924016 = 965320 / 1044700 >>> iteratorDeref: 0.847765 = 3802634 / 4485480 >>> iteratorCopy: 0.804068 = 11859111 / 14748894 >>> iteratorEquals: 1.59826 = 1388752 / 868913 >>> iteratorInc: 1.25659 = 1818421 / 1447111 >>> iteratorDec: 0.383738 = 537108 / 1399674 >>> iteratorDiff: 0.486488 = 562273 / 1155779 >>> iteratorSub: 0.0638659 = 6666 / 104375 >>> iteratorAdd: 0.184194 = 55586 / 301780 >>> iteratorAddAssign: 1.30369 = 1105439 / 847932 >>> iteratorArithmetic: 0.893672 = 5474245 / 6125564 >>> micros: 0.790603 = 233953 / 295917 >>> >>> Distribution: decreasing_unique(100000) >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] >>> >>> move: 0.40597 = 995312 / 2451688 >>> moveAssign: 0.45785 = 2459840 / 5372592 >>> compare: 0.489247 = 151631 / 309927 >>> iteratorDeref: 0.524798 = 3415934 / 6509043 >>> iteratorCopy: 0.563628 = 5971275 / 10594360 >>> iteratorEquals: 1.18395 = 843533 / 712475 >>> iteratorInc: 0.864454 = 1460463 / 1689462 >>> iteratorDec: 0.373125 = 1606976 / 4306800 >>> iteratorDiff: 0.495226 = 235930 / 476409 >>> iteratorSub: 0.248254 = 13824 / 55685 >>> iteratorAdd: 0.590877 = 135839 / 229894 >>> iteratorAddAssign: 0.421852 = 119646 / 283621 >>> iteratorArithmetic: 0.569514 = 4416211 / 7754346 >>> micros: 0.543145 = 167865 / 309061 >>> >>> Distribution: decreasing_dupes(100000) >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] >>> >>> move: 0.741695 = 1455276 / 1962096 >>> moveAssign: 0.736048 = 2826598 / 3840238 >>> compare: 0.641679 = 282851 / 440798 >>> iteratorDeref: 0.740565 = 3386062 / 4572271 >>> iteratorCopy: 0.628798 = 5008472 / 7965150 >>> iteratorEquals: 1.50206 = 794354 / 528843 >>> iteratorInc: 1.39602 = 2164546 / 1550510 >>> iteratorDec: 0.336639 = 837728 / 2488504 >>> iteratorDiff: 0.313457 = 88552 / 282501 >>> iteratorSub: 0.165388 = 3880 / 23460 >>> iteratorAdd: 0.377644 = 27139 / 71864 >>> iteratorAddAssign: 0.551364 = 182383 / 330785 >>> iteratorArithmetic: 0.776766 = 4098582 / 5276467 >>> micros: 0.71282 = 160398 / 225019 >>> >>> Distribution: mostly_decreasing(100000) >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, >>> 6, 4, 3, 2] >>> >>> move: 0.760602 = 2197435 / 2889075 >>> moveAssign: 0.768473 = 4591313 / 5974593 >>> compare: 0.932913 = 977678 / 1047984 >>> iteratorDeref: 0.838641 = 6633169 / 7909424 >>> iteratorCopy: 0.829028 = 15447960 / 18633830 >>> iteratorEquals: 1.7009 = 2000686 / 1176251 >>> iteratorInc: 1.52771 = 3788203 / 2479657 >>> iteratorDec: 0.363843 = 1369990 / 3765332 >>> iteratorDiff: 0.489186 = 595472 / 1217270 >>> iteratorSub: 0.0877827 = 10925 / 124455 >>> iteratorAdd: 0.201991 = 69640 / 344767 >>> iteratorAddAssign: 1.32756 = 1133512 / 853830 >>> iteratorArithmetic: 0.900303 = 8968428 / 9961562 >>> micros: 0.846298 = 377504 / 446065 >>> >>> Distribution: sawtooth_unique(100000) >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] >>> >>> move: 0.569953 = 2912465 / 5110007 >>> moveAssign: 0.577701 = 6012427 / 10407511 >>> compare: 0.824488 = 1604387 / 1945920 >>> iteratorDeref: 0.678979 = 9299794 / 13696724 >>> iteratorCopy: 0.72546 = 26157352 / 36056250 >>> iteratorEquals: 2.00683 = 4131766 / 2058856 >>> iteratorInc: 1.10728 = 6498257 / 5868688 >>> iteratorDec: 0.111225 = 552519 / 4967557 >>> iteratorDiff: 0.405328 = 988144 / 2437889 >>> iteratorSub: 0.026249 = 6327 / 241038 >>> iteratorAdd: 0.089034 = 55918 / 628052 >>> iteratorAddAssign: 1.18737 = 2297233 / 1934717 >>> iteratorArithmetic: 0.801143 = 14530164 / 18136797 >>> micros: 0.706282 = 583937 / 826776 >>> >>> Distribution: sawtooth_dupes(100000) >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] >>> >>> move: 0.547721 = 2348873 / 4288447 >>> moveAssign: 0.543252 = 4613817 / 8492965 >>> compare: 0.664642 = 841503 / 1266099 >>> iteratorDeref: 0.599656 = 6292282 / 10493159 >>> iteratorCopy: 0.60663 = 15336816 / 25281976 >>> iteratorEquals: 2.24803 = 2859760 / 1272118 >>> iteratorInc: 1.07142 = 4999546 / 4666276 >>> iteratorDec: 0.0490551 = 205953 / 4198405 >>> iteratorDiff: 0.314518 = 473157 / 1504390 >>> iteratorSub: 0.0371951 = 5998 / 161258 >>> iteratorAdd: 0.0875684 = 34672 / 395942 >>> iteratorAddAssign: 0.900009 = 1233077 / 1370072 >>> iteratorArithmetic: 0.72316 = 9812163 / 13568461 >>> micros: 0.610891 = 353102 / 578011 >>> >>> Distribution: increasing_with_swap(100000) >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] >>> >>> move: 1 = 291782 / 291782 >>> moveAssign: 1 = 491779 / 491779 >>> compare: 0.57449 = 225064 / 391763 >>> iteratorDeref: 0.819553 = 941922 / 1149312 >>> iteratorCopy: 0.377964 = 1392386 / 3683915 >>> iteratorEquals: 0.525758 = 117408 / 223312 >>> iteratorInc: 0.863708 = 333050 / 385605 >>> iteratorDec: 0.999753 = 291716 / 291788 >>> iteratorDiff: 0.224408 = 42528 / 189512 >>> iteratorSub: 0.888889 = 104 / 117 >>> iteratorAdd: 0.997151 = 16801 / 16849 >>> iteratorAddAssign: 0.302604 = 82864 / 273836 >>> iteratorArithmetic: 0.640448 = 884471 / 1381019 >>> micros: 0.616124 = 36233 / 58808 >>> >>> Distribution: increasing_fast_slow(100000) >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] >>> >>> move: 0.640348 = 825772 / 1289568 >>> moveAssign: 0.627081 = 1559792 / 2487384 >>> compare: 0.714235 = 431286 / 603843 >>> iteratorDeref: 0.694913 = 2421875 / 3485151 >>> iteratorCopy: 0.637176 = 5645877 / 8860786 >>> iteratorEquals: 1.7552 = 953880 / 543458 >>> iteratorInc: 1.08941 = 1676609 / 1539005 >>> iteratorDec: 0.135904 = 154533 / 1137073 >>> iteratorDiff: 0.388132 = 192787 / 496705 >>> iteratorSub: 0.00577964 = 141 / 24396 >>> iteratorAdd: 0.207287 = 17272 / 83324 >>> iteratorAddAssign: 0.849015 = 431320 / 508024 >>> iteratorArithmetic: 0.790987 = 3426542 / 4331985 >>> micros: 0.777699 = 153552 / 197444 >>> >>> >>> >>> #include <iostream> >>> #include <stdio.h> >>> #include <vector> >>> #include <cstdlib> >>> #include <algorithm> >>> #include <functional> >>> #include <ranges> >>> #include <chrono> >>> #include <cmath> >>> #include <numeric> >>> >>> using namespace std; >>> >>> #define LINE0 0 >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") >>> #define vout(v) #v << " = " << (v) >>> #define print1(v) cout_line << vout(v) << endl >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) >>> << ", " << vout(v3) << endl >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) >>> >>> template <class Ch, class Tr, class Container> >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, >>> Container const& x); >>> >>> template <class X, class Y> >>> ostream & operator << (ostream & os, pair <X, Y> const& p) { >>> return os << p.first << ":" << p.second; >>> } >>> >>> template <class Ch, class Tr, class Container> >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, >>> Container const& x) { >>> os << "["; >>> bool first = true; >>> for(auto const& y : x) { >>> if (!first) os << ", "; >>> os << y; >>> first = false; >>> } >>> return os << "]"; >>> } >>> >>> >>> struct Counter { >>> int construct = 0; >>> int copy = 0; >>> int assign = 0; >>> int move = 0; >>> int moveAssign = 0; >>> int compare = 0; >>> int iteratorDeref = 0; >>> int iteratorCopy = 0; >>> int iteratorEquals = 0; >>> int iteratorInc = 0; >>> int iteratorDec = 0; >>> int iteratorDiff = 0; >>> int iteratorSub = 0; >>> int iteratorAdd = 0; >>> int iteratorAddAssign = 0; >>> int iteratorArithmetic = 0; >>> int micros = 0; >>> }; >>> >>> int next_element_id = 0; >>> >>> class Element { >>> Counter* counter; >>> public: >>> int val; >>> int id; // Distinguis identical values to verify stability. >>> Element(int val, Counter* counter) >>> : val(val), counter(counter), id(++next_element_id) { >>> ++counter->construct; >>> } >>> >>> Element(Element const& other) >>> : val(other.val), counter(other.counter), id(other.id) { >>> ++counter->copy; >>> } >>> >>> Element& operator=(Element const& other) { >>> val = other.val; >>> counter = other.counter; >>> id = other.id; >>> ++counter->assign; >>> return *this; >>> } >>> >>> Element(Element&& other) noexcept >>> : val(move(other.val)), counter(other.counter), id(other.id) { >>> ++counter->move; >>> } >>> >>> Element& operator=(Element&& other) noexcept { >>> val = move(other.val); >>> counter = other.counter; >>> id = other.id; >>> ++counter->moveAssign; >>> return *this; >>> } >>> >>> bool operator<(Element const& other) const { >>> ++counter->compare; >>> return val < other.val; >>> } >>> >>> bool operator==(Element const& other) const { >>> return val == other.val && id == other.id; >>> } >>> }; >>> >>> ostream & operator << (ostream & os, Element const& e) { >>> return os << e.val << "(" << e.id << ")"; >>> } >>> >>> >>> class RandomIterator { >>> typedef vector<Element>::iterator It; >>> It it; >>> Counter* counter; >>> >>> public: >>> using iterator_category = std::random_access_iterator_tag; >>> using difference_type = std::ptrdiff_t; >>> using value_type = Element; >>> using pointer = value_type*; >>> using reference = value_type&; >>> >>> RandomIterator(It it, Counter* counter) : it(it), counter(counter) { >>> } >>> >>> RandomIterator(RandomIterator const& other) : it(other.it), >>> counter(other.counter) { >>> counter->iteratorCopy += 1; >>> } >>> >>> bool operator== (RandomIterator other) { >>> counter->iteratorEquals += 1; >>> counter->iteratorArithmetic += 1; >>> return it == other.it; >>> } >>> >>> RandomIterator operator++ () { >>> counter->iteratorInc += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(++it, counter); >>> } >>> >>> RandomIterator operator-- () { >>> counter->iteratorDec += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(--it, counter); >>> } >>> >>> Element& operator* () { >>> counter->iteratorDeref += 1; >>> return *it; >>> } >>> >>> difference_type operator- (RandomIterator other) { >>> counter->iteratorDiff += 1; >>> counter->iteratorArithmetic += 1; >>> return it - other.it; >>> } >>> >>> RandomIterator operator- (difference_type diff) { >>> counter->iteratorSub += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it - diff, counter); >>> } >>> >>> RandomIterator operator+ (difference_type diff) { >>> counter->iteratorAdd += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it + diff, counter); >>> } >>> >>> RandomIterator operator+= (difference_type diff) { >>> counter->iteratorAddAssign += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it += diff, counter); >>> } >>> }; >>> >>> >>> class TestCase { >>> Counter counter; >>> vector<Element> elements; >>> >>> public: >>> TestCase(vector<int> const& data) { >>> elements.reserve(data.size()); >>> for (int i: data) { >>> elements.push_back(Element(i, &counter)); >>> } >>> } >>> >>> Counter sort( >>> void algo(RandomIterator, RandomIterator), >>> vector<Element>::iterator begin, >>> vector<Element>::iterator end) { >>> counter = Counter(); >>> auto t0 = std::chrono::system_clock::now(); >>> >>> algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); >>> >>> auto t1 = std::chrono::system_clock::now(); >>> counter.micros = >>> std::chrono::duration_cast<std::chrono::microseconds>(t1 - >>> t0).count(); >>> return counter; >>> } >>> >>> pair<Counter,Counter> benchmark( >>> void new_algo(RandomIterator, RandomIterator), >>> void cur_algo(RandomIterator, RandomIterator)) { >>> vector<Element> toSort1 = elements; >>> Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); >>> >>> vector<Element> toSort2 = elements; >>> Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); >>> >>> if (toSort1 != toSort2) { >>> cout << "INCORRECT RESULT!" << endl; >>> exit(1); >>> } >>> >>> return make_pair(counter1, counter2); >>> } >>> }; >>> >>> >>> void relative_metric(string name, int m1, int m2) { >>> if (m1 == 0 && m2 == 0) return; >>> if (m2 == 0) { >>> cout << name << ": inf = " << m1 << " / " << m2 << endl; >>> } else { >>> cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " >>> << m2 << endl; >>> } >>> } >>> >>> >>> void relative_report(Counter const& c1, Counter const& c2) { >>> cout << endl; >>> relative_metric("construct", c1.construct, c2.construct); >>> relative_metric("copy", c1.copy, c2.copy); >>> relative_metric("assign", c1.assign, c2.assign); >>> relative_metric("move", c1.move, c2.move); >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); >>> relative_metric("compare", c1.compare, c2.compare); >>> relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); >>> relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); >>> relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); >>> relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, >>> c2.iteratorAddAssign); >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, >>> c2.iteratorArithmetic); >>> relative_metric("micros", c1.micros, c2.micros); >>> cout << endl; >>> } >>> >>> >>> /// This is a helper function for the merge routines. >>> template<typename _BidirectionalIterator, typename _Distance, typename _Compare> >>> void _my_merge_without_buffer( >>> _BidirectionalIterator __first, >>> _BidirectionalIterator __middle, >>> _BidirectionalIterator __last, >>> _Distance __len1, _Distance __len2, >>> _Compare __comp) { >>> >>> if (__len1 == 0 || __len2 == 0) >>> return; >>> >>> if (__len1 + __len2 == 2) { >>> if (__comp(__middle, __first)) >>> std::iter_swap(__first, __middle); >>> return; >>> } >>> >>> _BidirectionalIterator __first_cut = __first; >>> _BidirectionalIterator __second_cut = __middle; >>> _Distance __len11 = 0; >>> _Distance __len22 = 0; >>> >>> >>> #define NEW_ALGO 1 >>> >>> #ifdef NEW_ALGO >>> // Prevent stack overflow >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { >>> #endif >>> if (__len1 > __len2) >>> { >>> __len11 = __len1 / 2; >>> std::advance(__first_cut, __len11); >>> __second_cut >>> = std::__lower_bound(__middle, __last, *__first_cut, >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); >>> __len22 = std::distance(__middle, __second_cut); >>> } >>> else >>> { >>> __len22 = __len2 / 2; >>> std::advance(__second_cut, __len22); >>> __first_cut >>> = std::__upper_bound(__first, __middle, *__second_cut, >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); >>> __len11 = std::distance(__first, __first_cut); >>> } >>> #ifdef NEW_ALGO >>> } else >>> { >>> _Distance __cut_range; >>> // __cut_len is always equal to __middle - __first_cut (and eventually to >>> // __second_cut - __middle) but we keep track of it to avoid calculating it >>> // later which can be expensive for bidirectional iterators. >>> _Distance __cut_len; >>> >>> if (__len1 > __len2) >>> { >>> std::advance(__first_cut, __len1 - __len2); >>> __cut_range = __len2; >>> } >>> else >>> { >>> __cut_range = __len1; >>> } >>> __cut_len = __cut_range; >>> >>> while (__cut_range > 0) { >>> _Distance __half_cut_range = __cut_range / 2; >>> _BidirectionalIterator __mid_first_cut = __first_cut; >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); >>> _BidirectionalIterator __mid_second_cut = __second_cut; >>> std::advance(__mid_second_cut, __half_cut_range); >>> >>> if (__comp(__mid_second_cut, __mid_first_cut)) { >>> __cut_range -= __half_cut_range + 1; >>> __second_cut = __mid_second_cut; >>> ++__second_cut; >>> } else { >>> __cut_len -= __cut_range - __half_cut_range; >>> __cut_range = __half_cut_range; >>> __first_cut = __mid_first_cut; >>> ++__first_cut; >>> } >>> } >>> __len22 = __cut_len; >>> __len11 = __len1 - __cut_len; >>> } >>> #endif >>> >>> _BidirectionalIterator __new_middle >>> = std::rotate(__first_cut, __middle, __second_cut); >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, >>> __len11, __len22, __comp); >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, >>> __len1 - __len11, __len2 - __len22, __comp); >>> } >>> >>> /// This is a helper function for the stable sorting routines. >>> template<typename _RandomAccessIterator, typename _Compare> >>> void >>> _my_inplace_stable_sort(_RandomAccessIterator __first, >>> _RandomAccessIterator __last, _Compare __comp) >>> { >>> >>> if (__last - __first < 15) >>> { >>> std::__insertion_sort(__first, __last, __comp); >>> return; >>> } >>> _RandomAccessIterator __middle = __first + (__last - __first) / 2; >>> _my_inplace_stable_sort(__first, __middle, __comp); >>> _my_inplace_stable_sort(__middle, __last, __comp); >>> _my_merge_without_buffer(__first, __middle, __last, >>> __middle - __first, >>> __last - __middle, >>> __comp); >>> } >>> >>> >>> template<typename Iterator> >>> void my_inplace_stable_sort(Iterator start, Iterator end) { >>> _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); >>> } >>> >>> template<typename Iterator> >>> void std_inplace_stable_sort(Iterator start, Iterator end) { >>> std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); >>> } >>> >>> void benchmark(string const& dist_name, >>> vector<int> dist_func(int), >>> int n) { >>> vector<int> v = dist_func(n); >>> cout << "Distribution: " << dist_name << "(" << n << ")" << endl; >>> cout << "Example: " << dist_func(20) << endl; >>> TestCase testCase(v); >>> >>> pair<Counter, Counter> counter_pair = >>> testCase.benchmark( >>> my_inplace_stable_sort >>> , std_inplace_stable_sort >>> ); >>> Counter cKloet = counter_pair.first; >>> Counter cInplace = counter_pair.second; >>> >>> relative_report(cKloet, cInplace); >>> } >>> >>> >>> vector<int> random_unique(int n) { >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> for (int i = n - 1; i >= 0; --i) { >>> swap(v[i], v[rand() % (i + 1)]); >>> } >>> return v; >>> } >>> >>> vector<int> random_dupes(int n) { >>> vector<int> v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i / r; >>> } >>> for (int i = n - 1; i >= 0; --i) { >>> swap(v[i], v[rand() % (i + 1)]); >>> } >>> return v; >>> } >>> >>> vector<int> increasing_unique(int n) { >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> return v; >>> } >>> >>> >>> vector<int> increasing_dupes(int n) { >>> vector<int> v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i / r; >>> } >>> return v; >>> } >>> >>> >>> vector<int> mostly_increasing(int n) { >>> vector<int> v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i + rand() % r; >>> } >>> return v; >>> } >>> >>> >>> vector<int> decreasing_unique(int n) { >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = n - 1 - i; >>> } >>> return v; >>> } >>> >>> vector<int> decreasing_dupes(int n) { >>> vector<int> v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = (n - 1 - i) / r; >>> } >>> return v; >>> } >>> >>> vector<int> mostly_decreasing(int n) { >>> vector<int> v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = n - 1 - i + rand() % r; >>> } >>> return v; >>> } >>> >>> vector<int> sawtooth_unique(int n) { >>> int r = (int)sqrt(n); >>> while (gcd(r, n) != 1) ++r; >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[(i * r) % n] = i; >>> } >>> return v; >>> } >>> >>> vector<int> sawtooth_dupes(int n) { >>> int r = (int)sqrt(n); >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i % r; >>> } >>> return v; >>> } >>> >>> vector<int> increasing_with_swap(int n) { >>> vector<int> v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> swap(v[0], v[n - 1]); >>> return v; >>> } >>> >>> vector<int> increasing_fast_slow(int n) { >>> int r = (int)sqrt(n); >>> vector<int> v(n); >>> for (int i = 0; i < r; ++i) { >>> v[i] = i * r; >>> } >>> for (int i = r; i < n; ++i) { >>> v[i] = (i - r) * n / (n - r); >>> } >>> return v; >>> } >>> >>> >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) >>> >>> int main() { >>> int n = 100000; >>> BENCHMARK(random_unique, n); >>> BENCHMARK(random_dupes, n); >>> BENCHMARK(increasing_unique, n); >>> BENCHMARK(increasing_dupes, n); >>> BENCHMARK(mostly_increasing, n); >>> BENCHMARK(decreasing_unique, n); >>> BENCHMARK(decreasing_dupes, n); >>> BENCHMARK(mostly_decreasing, n); >>> BENCHMARK(sawtooth_unique, n); >>> BENCHMARK(sawtooth_dupes, n); >>> BENCHMARK(increasing_with_swap, n); >>> BENCHMARK(increasing_fast_slow, n); >>> >>> return 0; >>> } >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: __inplace_stable_sort improvement 2021-11-25 5:35 ` François Dumont @ 2021-11-25 19:51 ` David de Kloet 2021-12-10 15:31 ` David de Kloet 0 siblings, 1 reply; 6+ messages in thread From: David de Kloet @ 2021-11-25 19:51 UTC (permalink / raw) To: François Dumont; +Cc: libstdc++ On Thu, Nov 25, 2021 at 6:35 AM François Dumont <frs.dumont@gmail.com> wrote: > > On 24/11/21 7:48 pm, David de Kloet wrote: > > On Wed, Nov 24, 2021 at 7:33 PM François Dumont <frs.dumont@gmail.com> wrote: > >> > >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > >>> Hello libstdc++, > >>> > >>> Apologies if this is the wrong list for my message. I believe I have an > >>> improvement for __inplace_stable_sort and I'm wondering if this is worth > >>> pursuing and if so how to proceed. > >> Of course, anyone interested in enhancing the library is welcome. > >> > >> The best way to do so would be to provide a patch on the current library > >> code. I'm the last one to have touch to this algo and it would be easier > >> for me to understand if you try to implement it directly on the library. > > _my_merge_without_buffer below preprocesses exactly to > > __merge_without_buffer from the library if you undefine NEW_ALGO so > > you can see exactly what my changes are. > > > > I currently don't have much time, but when I do I'd be happy to > > provide it as a patch. If you find time before me, you're also more > > than welcome to use the code. > > > >> The good news is that you are trying to enhance a piece of code rarely > >> used as it is only when the buffer allocation fails. So it is less > >> critical and so perhaps more open to modifications. > >> > >> During my last work on it I've added a performance test: > >> performance/25_algorithms/inplace_merge.cc. It is also supposed to check > >> performance when we fail to get a buffer and so use > >> __merge_without_buffer. Use 'make check-performance' to run it, see > >> library manual guide. Do not hesitate to change it if you think that > >> your benches are better. > > I don't currently have the environment to `make check-performance`. I > > just copied the code from the web and adjusted it. But (again) when I > > have time, I'll look into that. > > > > If you know of any "contributing to stdlibc++" guides, that would be great too. > > https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html > > > > > >>> __inplace_stable_sort does a merge sort where in each recursive step the 2 > >>> sublists are merged in-place. The merge step is itself also recursive. In order > >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 > >>> of [B, C) are found such that all elements in S1 are greater than all elements > >>> in S2 and then those sections are swapped/rotated and the shorter sections are > >>> merged recursively. This is done by the function __merge_without_buffer. > >>> > >>> In the current implementation, __merge_without_buffer takes the half point of > >>> the longer section and does a binary search (via lower_bound or upper_bound) to > >>> find the corresponding cut point on the other section. Then it rotates the > >>> sections. > >>> > >>> My proposal is to find the cut points such that both rotated sections have the > >>> same length such that they can always simply be swapped. > >> What do you mean by swapped ? > > I mean the first element of section 1 gets swapped with the first > > element of section 2. The second element of section 1 gets swapped > > with the second element of section 2. Etc. for each index. > > > >> I see in the code below that you are just > >> changing how the cut positions are computed and then call the same > >> std::rotate algo. I cannot see any reason why changing the position > >> would make the rotate faster. > > The implementation of rotate does a series of such section swaps. But > > if both section have the same length, it's only a single swap of > > sections which makes rotate about 1.5-2 times faster. > > Ok, I see now, you try to use as much as possible the std::swap_ranges > in __rotate for random access iterator. Maybe it means that this > optimization should be limited to this type of iterator. We'll see > depending on benches. The binary search implementation is a bit awkward for the bidi-iterator, so having a separate implementation for random access iterators might improve the readability. However, if you look at my benchmarks, it also does fewer comparisons than the current algorithm. But rotate() doesn't do any comparisons. So there must be another benefit to choosing the cut points this way, but I'm not entirely sure why. > > I'll have a try then when I have time. > > Thanks > > > > > The purpose of the code I added is to choose the cut positions such > > that the rotated sections are of equal length, which makes rotate > > faster by doing fewer element swaps, resulting in fewer calls to the > > copy/assignment constructor. > > > >>> I have implemented > >>> this and it seems to perform better in general and in the special cases that > >>> I've tried. > >>> > >>> Unfortunately, since we are no longer guaranteed to halve the longer section in > >>> each step, there is a possibility for a stack overflow. To avoid this, the new > >>> algorithm falls back to the old behavior whenever the sections have > >>> significantly different lengths. > >>> > >>> Also, where the current algorithm performs the binary search implicitly via > >>> lower_bound/upper_bound, the new algorithm has to implement the binary search > >>> explicitly because it has to look at both sections to determine the next step > >>> in the search. > >>> > >>> I have created wrappers for my value objects as well as my iterators to count > >>> the number of operations performed by the current and my algorithm and run > >>> benchmarks on various distributions. My code is at the bottom but below is the > >>> output as run on my machine. For each operation it outputs the ratio between > >>> the number of operations performed by my algorithm over the number performed by > >>> the current algorithm. > >>> > >>> Thanks for your attention, > >>> David > >>> > >>> > >>> Distribution: random_unique(100000) > >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] > >>> > >>> move: 0.588026 = 3331848 / 5666154 > >>> moveAssign: 0.594789 = 6852819 / 11521431 > >>> compare: 0.914997 = 2028977 / 2217469 > >>> iteratorDeref: 0.718466 = 11001863 / 15312994 > >>> iteratorCopy: 0.791332 = 33977189 / 42936712 > >>> iteratorEquals: 2.27841 = 5174606 / 2271143 > >>> iteratorInc: 1.2143 = 7697971 / 6339449 > >>> iteratorDec: 0.0833529 = 473556 / 5681340 > >>> iteratorDiff: 0.444865 = 1362598 / 3062950 > >>> iteratorSub: 0.0200683 = 6272 / 312533 > >>> iteratorAdd: 0.0695467 = 57149 / 821736 > >>> iteratorAddAssign: 1.36491 = 3141949 / 2301950 > >>> iteratorArithmetic: 0.861623 = 17914101 / 20791101 > >>> micros: 0.663704 = 650825 / 980596 > >>> > >>> Distribution: random_dupes(100000) > >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] > >>> > >>> move: 0.570041 = 2949623 / 5174401 > >>> moveAssign: 0.577736 = 6087819 / 10537375 > >>> compare: 0.845419 = 1530877 / 1810790 > >>> iteratorDeref: 0.678233 = 9232869 / 13613114 > >>> iteratorCopy: 0.742699 = 25652244 / 34539225 > >>> iteratorEquals: 2.41643 = 4123881 / 1706601 > >>> iteratorInc: 1.15617 = 6554374 / 5669028 > >>> iteratorDec: 0.101004 = 539991 / 5346219 > >>> iteratorDiff: 0.417093 = 959972 / 2301580 > >>> iteratorSub: 0.0350046 = 8882 / 253738 > >>> iteratorAdd: 0.0972571 = 64122 / 659304 > >>> iteratorAddAssign: 1.2279 = 2247197 / 1830107 > >>> iteratorArithmetic: 0.81605 = 14498419 / 17766577 > >>> micros: 0.699386 = 554426 / 792732 > >>> > >>> Distribution: increasing_unique(100000) > >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] > >>> > >>> move: 1 = 91808 / 91808 > >>> moveAssign: 1 = 91808 / 91808 > >>> compare: 0.573591 = 224558 / 391495 > >>> iteratorDeref: 0.722385 = 540924 / 748803 > >>> iteratorCopy: 0.298175 = 976735 / 3275708 > >>> iteratorEquals: 0.522302 = 116383 / 222827 > >>> iteratorInc: 0.715444 = 132750 / 185549 > >>> iteratorDec: 1 = 91808 / 91808 > >>> iteratorDiff: 0.217455 = 40956 / 188342 > >>> iteratorAdd: 1 = 16383 / 16383 > >>> iteratorAddAssign: 0.299523 = 81884 / 273381 > >>> iteratorArithmetic: 0.49082 = 480164 / 978290 > >>> micros: 0.479792 = 22935 / 47802 > >>> > >>> Distribution: increasing_dupes(100000) > >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] > >>> > >>> move: 1 = 91808 / 91808 > >>> moveAssign: 1 = 91808 / 91808 > >>> compare: 0.573591 = 224558 / 391495 > >>> iteratorDeref: 0.722385 = 540924 / 748803 > >>> iteratorCopy: 0.298175 = 976735 / 3275708 > >>> iteratorEquals: 0.522302 = 116383 / 222827 > >>> iteratorInc: 0.715444 = 132750 / 185549 > >>> iteratorDec: 1 = 91808 / 91808 > >>> iteratorDiff: 0.217455 = 40956 / 188342 > >>> iteratorAdd: 1 = 16383 / 16383 > >>> iteratorAddAssign: 0.299523 = 81884 / 273381 > >>> iteratorArithmetic: 0.49082 = 480164 / 978290 > >>> micros: 0.449154 = 20675 / 46031 > >>> > >>> Distribution: mostly_increasing(100000) > >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] > >>> > >>> move: 0.681093 = 814725 / 1196203 > >>> moveAssign: 0.703421 = 1809563 / 2572519 > >>> compare: 0.924016 = 965320 / 1044700 > >>> iteratorDeref: 0.847765 = 3802634 / 4485480 > >>> iteratorCopy: 0.804068 = 11859111 / 14748894 > >>> iteratorEquals: 1.59826 = 1388752 / 868913 > >>> iteratorInc: 1.25659 = 1818421 / 1447111 > >>> iteratorDec: 0.383738 = 537108 / 1399674 > >>> iteratorDiff: 0.486488 = 562273 / 1155779 > >>> iteratorSub: 0.0638659 = 6666 / 104375 > >>> iteratorAdd: 0.184194 = 55586 / 301780 > >>> iteratorAddAssign: 1.30369 = 1105439 / 847932 > >>> iteratorArithmetic: 0.893672 = 5474245 / 6125564 > >>> micros: 0.790603 = 233953 / 295917 > >>> > >>> Distribution: decreasing_unique(100000) > >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] > >>> > >>> move: 0.40597 = 995312 / 2451688 > >>> moveAssign: 0.45785 = 2459840 / 5372592 > >>> compare: 0.489247 = 151631 / 309927 > >>> iteratorDeref: 0.524798 = 3415934 / 6509043 > >>> iteratorCopy: 0.563628 = 5971275 / 10594360 > >>> iteratorEquals: 1.18395 = 843533 / 712475 > >>> iteratorInc: 0.864454 = 1460463 / 1689462 > >>> iteratorDec: 0.373125 = 1606976 / 4306800 > >>> iteratorDiff: 0.495226 = 235930 / 476409 > >>> iteratorSub: 0.248254 = 13824 / 55685 > >>> iteratorAdd: 0.590877 = 135839 / 229894 > >>> iteratorAddAssign: 0.421852 = 119646 / 283621 > >>> iteratorArithmetic: 0.569514 = 4416211 / 7754346 > >>> micros: 0.543145 = 167865 / 309061 > >>> > >>> Distribution: decreasing_dupes(100000) > >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] > >>> > >>> move: 0.741695 = 1455276 / 1962096 > >>> moveAssign: 0.736048 = 2826598 / 3840238 > >>> compare: 0.641679 = 282851 / 440798 > >>> iteratorDeref: 0.740565 = 3386062 / 4572271 > >>> iteratorCopy: 0.628798 = 5008472 / 7965150 > >>> iteratorEquals: 1.50206 = 794354 / 528843 > >>> iteratorInc: 1.39602 = 2164546 / 1550510 > >>> iteratorDec: 0.336639 = 837728 / 2488504 > >>> iteratorDiff: 0.313457 = 88552 / 282501 > >>> iteratorSub: 0.165388 = 3880 / 23460 > >>> iteratorAdd: 0.377644 = 27139 / 71864 > >>> iteratorAddAssign: 0.551364 = 182383 / 330785 > >>> iteratorArithmetic: 0.776766 = 4098582 / 5276467 > >>> micros: 0.71282 = 160398 / 225019 > >>> > >>> Distribution: mostly_decreasing(100000) > >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, > >>> 6, 4, 3, 2] > >>> > >>> move: 0.760602 = 2197435 / 2889075 > >>> moveAssign: 0.768473 = 4591313 / 5974593 > >>> compare: 0.932913 = 977678 / 1047984 > >>> iteratorDeref: 0.838641 = 6633169 / 7909424 > >>> iteratorCopy: 0.829028 = 15447960 / 18633830 > >>> iteratorEquals: 1.7009 = 2000686 / 1176251 > >>> iteratorInc: 1.52771 = 3788203 / 2479657 > >>> iteratorDec: 0.363843 = 1369990 / 3765332 > >>> iteratorDiff: 0.489186 = 595472 / 1217270 > >>> iteratorSub: 0.0877827 = 10925 / 124455 > >>> iteratorAdd: 0.201991 = 69640 / 344767 > >>> iteratorAddAssign: 1.32756 = 1133512 / 853830 > >>> iteratorArithmetic: 0.900303 = 8968428 / 9961562 > >>> micros: 0.846298 = 377504 / 446065 > >>> > >>> Distribution: sawtooth_unique(100000) > >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] > >>> > >>> move: 0.569953 = 2912465 / 5110007 > >>> moveAssign: 0.577701 = 6012427 / 10407511 > >>> compare: 0.824488 = 1604387 / 1945920 > >>> iteratorDeref: 0.678979 = 9299794 / 13696724 > >>> iteratorCopy: 0.72546 = 26157352 / 36056250 > >>> iteratorEquals: 2.00683 = 4131766 / 2058856 > >>> iteratorInc: 1.10728 = 6498257 / 5868688 > >>> iteratorDec: 0.111225 = 552519 / 4967557 > >>> iteratorDiff: 0.405328 = 988144 / 2437889 > >>> iteratorSub: 0.026249 = 6327 / 241038 > >>> iteratorAdd: 0.089034 = 55918 / 628052 > >>> iteratorAddAssign: 1.18737 = 2297233 / 1934717 > >>> iteratorArithmetic: 0.801143 = 14530164 / 18136797 > >>> micros: 0.706282 = 583937 / 826776 > >>> > >>> Distribution: sawtooth_dupes(100000) > >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] > >>> > >>> move: 0.547721 = 2348873 / 4288447 > >>> moveAssign: 0.543252 = 4613817 / 8492965 > >>> compare: 0.664642 = 841503 / 1266099 > >>> iteratorDeref: 0.599656 = 6292282 / 10493159 > >>> iteratorCopy: 0.60663 = 15336816 / 25281976 > >>> iteratorEquals: 2.24803 = 2859760 / 1272118 > >>> iteratorInc: 1.07142 = 4999546 / 4666276 > >>> iteratorDec: 0.0490551 = 205953 / 4198405 > >>> iteratorDiff: 0.314518 = 473157 / 1504390 > >>> iteratorSub: 0.0371951 = 5998 / 161258 > >>> iteratorAdd: 0.0875684 = 34672 / 395942 > >>> iteratorAddAssign: 0.900009 = 1233077 / 1370072 > >>> iteratorArithmetic: 0.72316 = 9812163 / 13568461 > >>> micros: 0.610891 = 353102 / 578011 > >>> > >>> Distribution: increasing_with_swap(100000) > >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] > >>> > >>> move: 1 = 291782 / 291782 > >>> moveAssign: 1 = 491779 / 491779 > >>> compare: 0.57449 = 225064 / 391763 > >>> iteratorDeref: 0.819553 = 941922 / 1149312 > >>> iteratorCopy: 0.377964 = 1392386 / 3683915 > >>> iteratorEquals: 0.525758 = 117408 / 223312 > >>> iteratorInc: 0.863708 = 333050 / 385605 > >>> iteratorDec: 0.999753 = 291716 / 291788 > >>> iteratorDiff: 0.224408 = 42528 / 189512 > >>> iteratorSub: 0.888889 = 104 / 117 > >>> iteratorAdd: 0.997151 = 16801 / 16849 > >>> iteratorAddAssign: 0.302604 = 82864 / 273836 > >>> iteratorArithmetic: 0.640448 = 884471 / 1381019 > >>> micros: 0.616124 = 36233 / 58808 > >>> > >>> Distribution: increasing_fast_slow(100000) > >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] > >>> > >>> move: 0.640348 = 825772 / 1289568 > >>> moveAssign: 0.627081 = 1559792 / 2487384 > >>> compare: 0.714235 = 431286 / 603843 > >>> iteratorDeref: 0.694913 = 2421875 / 3485151 > >>> iteratorCopy: 0.637176 = 5645877 / 8860786 > >>> iteratorEquals: 1.7552 = 953880 / 543458 > >>> iteratorInc: 1.08941 = 1676609 / 1539005 > >>> iteratorDec: 0.135904 = 154533 / 1137073 > >>> iteratorDiff: 0.388132 = 192787 / 496705 > >>> iteratorSub: 0.00577964 = 141 / 24396 > >>> iteratorAdd: 0.207287 = 17272 / 83324 > >>> iteratorAddAssign: 0.849015 = 431320 / 508024 > >>> iteratorArithmetic: 0.790987 = 3426542 / 4331985 > >>> micros: 0.777699 = 153552 / 197444 > >>> > >>> > >>> > >>> #include <iostream> > >>> #include <stdio.h> > >>> #include <vector> > >>> #include <cstdlib> > >>> #include <algorithm> > >>> #include <functional> > >>> #include <ranges> > >>> #include <chrono> > >>> #include <cmath> > >>> #include <numeric> > >>> > >>> using namespace std; > >>> > >>> #define LINE0 0 > >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > >>> #define vout(v) #v << " = " << (v) > >>> #define print1(v) cout_line << vout(v) << endl > >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl > >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) > >>> << ", " << vout(v3) << endl > >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME > >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) > >>> > >>> template <class Ch, class Tr, class Container> > >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > >>> Container const& x); > >>> > >>> template <class X, class Y> > >>> ostream & operator << (ostream & os, pair <X, Y> const& p) { > >>> return os << p.first << ":" << p.second; > >>> } > >>> > >>> template <class Ch, class Tr, class Container> > >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > >>> Container const& x) { > >>> os << "["; > >>> bool first = true; > >>> for(auto const& y : x) { > >>> if (!first) os << ", "; > >>> os << y; > >>> first = false; > >>> } > >>> return os << "]"; > >>> } > >>> > >>> > >>> struct Counter { > >>> int construct = 0; > >>> int copy = 0; > >>> int assign = 0; > >>> int move = 0; > >>> int moveAssign = 0; > >>> int compare = 0; > >>> int iteratorDeref = 0; > >>> int iteratorCopy = 0; > >>> int iteratorEquals = 0; > >>> int iteratorInc = 0; > >>> int iteratorDec = 0; > >>> int iteratorDiff = 0; > >>> int iteratorSub = 0; > >>> int iteratorAdd = 0; > >>> int iteratorAddAssign = 0; > >>> int iteratorArithmetic = 0; > >>> int micros = 0; > >>> }; > >>> > >>> int next_element_id = 0; > >>> > >>> class Element { > >>> Counter* counter; > >>> public: > >>> int val; > >>> int id; // Distinguis identical values to verify stability. > >>> Element(int val, Counter* counter) > >>> : val(val), counter(counter), id(++next_element_id) { > >>> ++counter->construct; > >>> } > >>> > >>> Element(Element const& other) > >>> : val(other.val), counter(other.counter), id(other.id) { > >>> ++counter->copy; > >>> } > >>> > >>> Element& operator=(Element const& other) { > >>> val = other.val; > >>> counter = other.counter; > >>> id = other.id; > >>> ++counter->assign; > >>> return *this; > >>> } > >>> > >>> Element(Element&& other) noexcept > >>> : val(move(other.val)), counter(other.counter), id(other.id) { > >>> ++counter->move; > >>> } > >>> > >>> Element& operator=(Element&& other) noexcept { > >>> val = move(other.val); > >>> counter = other.counter; > >>> id = other.id; > >>> ++counter->moveAssign; > >>> return *this; > >>> } > >>> > >>> bool operator<(Element const& other) const { > >>> ++counter->compare; > >>> return val < other.val; > >>> } > >>> > >>> bool operator==(Element const& other) const { > >>> return val == other.val && id == other.id; > >>> } > >>> }; > >>> > >>> ostream & operator << (ostream & os, Element const& e) { > >>> return os << e.val << "(" << e.id << ")"; > >>> } > >>> > >>> > >>> class RandomIterator { > >>> typedef vector<Element>::iterator It; > >>> It it; > >>> Counter* counter; > >>> > >>> public: > >>> using iterator_category = std::random_access_iterator_tag; > >>> using difference_type = std::ptrdiff_t; > >>> using value_type = Element; > >>> using pointer = value_type*; > >>> using reference = value_type&; > >>> > >>> RandomIterator(It it, Counter* counter) : it(it), counter(counter) { > >>> } > >>> > >>> RandomIterator(RandomIterator const& other) : it(other.it), > >>> counter(other.counter) { > >>> counter->iteratorCopy += 1; > >>> } > >>> > >>> bool operator== (RandomIterator other) { > >>> counter->iteratorEquals += 1; > >>> counter->iteratorArithmetic += 1; > >>> return it == other.it; > >>> } > >>> > >>> RandomIterator operator++ () { > >>> counter->iteratorInc += 1; > >>> counter->iteratorArithmetic += 1; > >>> return RandomIterator(++it, counter); > >>> } > >>> > >>> RandomIterator operator-- () { > >>> counter->iteratorDec += 1; > >>> counter->iteratorArithmetic += 1; > >>> return RandomIterator(--it, counter); > >>> } > >>> > >>> Element& operator* () { > >>> counter->iteratorDeref += 1; > >>> return *it; > >>> } > >>> > >>> difference_type operator- (RandomIterator other) { > >>> counter->iteratorDiff += 1; > >>> counter->iteratorArithmetic += 1; > >>> return it - other.it; > >>> } > >>> > >>> RandomIterator operator- (difference_type diff) { > >>> counter->iteratorSub += 1; > >>> counter->iteratorArithmetic += 1; > >>> return RandomIterator(it - diff, counter); > >>> } > >>> > >>> RandomIterator operator+ (difference_type diff) { > >>> counter->iteratorAdd += 1; > >>> counter->iteratorArithmetic += 1; > >>> return RandomIterator(it + diff, counter); > >>> } > >>> > >>> RandomIterator operator+= (difference_type diff) { > >>> counter->iteratorAddAssign += 1; > >>> counter->iteratorArithmetic += 1; > >>> return RandomIterator(it += diff, counter); > >>> } > >>> }; > >>> > >>> > >>> class TestCase { > >>> Counter counter; > >>> vector<Element> elements; > >>> > >>> public: > >>> TestCase(vector<int> const& data) { > >>> elements.reserve(data.size()); > >>> for (int i: data) { > >>> elements.push_back(Element(i, &counter)); > >>> } > >>> } > >>> > >>> Counter sort( > >>> void algo(RandomIterator, RandomIterator), > >>> vector<Element>::iterator begin, > >>> vector<Element>::iterator end) { > >>> counter = Counter(); > >>> auto t0 = std::chrono::system_clock::now(); > >>> > >>> algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); > >>> > >>> auto t1 = std::chrono::system_clock::now(); > >>> counter.micros = > >>> std::chrono::duration_cast<std::chrono::microseconds>(t1 - > >>> t0).count(); > >>> return counter; > >>> } > >>> > >>> pair<Counter,Counter> benchmark( > >>> void new_algo(RandomIterator, RandomIterator), > >>> void cur_algo(RandomIterator, RandomIterator)) { > >>> vector<Element> toSort1 = elements; > >>> Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); > >>> > >>> vector<Element> toSort2 = elements; > >>> Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); > >>> > >>> if (toSort1 != toSort2) { > >>> cout << "INCORRECT RESULT!" << endl; > >>> exit(1); > >>> } > >>> > >>> return make_pair(counter1, counter2); > >>> } > >>> }; > >>> > >>> > >>> void relative_metric(string name, int m1, int m2) { > >>> if (m1 == 0 && m2 == 0) return; > >>> if (m2 == 0) { > >>> cout << name << ": inf = " << m1 << " / " << m2 << endl; > >>> } else { > >>> cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " > >>> << m2 << endl; > >>> } > >>> } > >>> > >>> > >>> void relative_report(Counter const& c1, Counter const& c2) { > >>> cout << endl; > >>> relative_metric("construct", c1.construct, c2.construct); > >>> relative_metric("copy", c1.copy, c2.copy); > >>> relative_metric("assign", c1.assign, c2.assign); > >>> relative_metric("move", c1.move, c2.move); > >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > >>> relative_metric("compare", c1.compare, c2.compare); > >>> relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); > >>> relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); > >>> relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); > >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > >>> relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); > >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > >>> c2.iteratorAddAssign); > >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > >>> c2.iteratorArithmetic); > >>> relative_metric("micros", c1.micros, c2.micros); > >>> cout << endl; > >>> } > >>> > >>> > >>> /// This is a helper function for the merge routines. > >>> template<typename _BidirectionalIterator, typename _Distance, typename _Compare> > >>> void _my_merge_without_buffer( > >>> _BidirectionalIterator __first, > >>> _BidirectionalIterator __middle, > >>> _BidirectionalIterator __last, > >>> _Distance __len1, _Distance __len2, > >>> _Compare __comp) { > >>> > >>> if (__len1 == 0 || __len2 == 0) > >>> return; > >>> > >>> if (__len1 + __len2 == 2) { > >>> if (__comp(__middle, __first)) > >>> std::iter_swap(__first, __middle); > >>> return; > >>> } > >>> > >>> _BidirectionalIterator __first_cut = __first; > >>> _BidirectionalIterator __second_cut = __middle; > >>> _Distance __len11 = 0; > >>> _Distance __len22 = 0; > >>> > >>> > >>> #define NEW_ALGO 1 > >>> > >>> #ifdef NEW_ALGO > >>> // Prevent stack overflow > >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > >>> #endif > >>> if (__len1 > __len2) > >>> { > >>> __len11 = __len1 / 2; > >>> std::advance(__first_cut, __len11); > >>> __second_cut > >>> = std::__lower_bound(__middle, __last, *__first_cut, > >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); > >>> __len22 = std::distance(__middle, __second_cut); > >>> } > >>> else > >>> { > >>> __len22 = __len2 / 2; > >>> std::advance(__second_cut, __len22); > >>> __first_cut > >>> = std::__upper_bound(__first, __middle, *__second_cut, > >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); > >>> __len11 = std::distance(__first, __first_cut); > >>> } > >>> #ifdef NEW_ALGO > >>> } else > >>> { > >>> _Distance __cut_range; > >>> // __cut_len is always equal to __middle - __first_cut (and eventually to > >>> // __second_cut - __middle) but we keep track of it to avoid calculating it > >>> // later which can be expensive for bidirectional iterators. > >>> _Distance __cut_len; > >>> > >>> if (__len1 > __len2) > >>> { > >>> std::advance(__first_cut, __len1 - __len2); > >>> __cut_range = __len2; > >>> } > >>> else > >>> { > >>> __cut_range = __len1; > >>> } > >>> __cut_len = __cut_range; > >>> > >>> while (__cut_range > 0) { > >>> _Distance __half_cut_range = __cut_range / 2; > >>> _BidirectionalIterator __mid_first_cut = __first_cut; > >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > >>> _BidirectionalIterator __mid_second_cut = __second_cut; > >>> std::advance(__mid_second_cut, __half_cut_range); > >>> > >>> if (__comp(__mid_second_cut, __mid_first_cut)) { > >>> __cut_range -= __half_cut_range + 1; > >>> __second_cut = __mid_second_cut; > >>> ++__second_cut; > >>> } else { > >>> __cut_len -= __cut_range - __half_cut_range; > >>> __cut_range = __half_cut_range; > >>> __first_cut = __mid_first_cut; > >>> ++__first_cut; > >>> } > >>> } > >>> __len22 = __cut_len; > >>> __len11 = __len1 - __cut_len; > >>> } > >>> #endif > >>> > >>> _BidirectionalIterator __new_middle > >>> = std::rotate(__first_cut, __middle, __second_cut); > >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, > >>> __len11, __len22, __comp); > >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, > >>> __len1 - __len11, __len2 - __len22, __comp); > >>> } > >>> > >>> /// This is a helper function for the stable sorting routines. > >>> template<typename _RandomAccessIterator, typename _Compare> > >>> void > >>> _my_inplace_stable_sort(_RandomAccessIterator __first, > >>> _RandomAccessIterator __last, _Compare __comp) > >>> { > >>> > >>> if (__last - __first < 15) > >>> { > >>> std::__insertion_sort(__first, __last, __comp); > >>> return; > >>> } > >>> _RandomAccessIterator __middle = __first + (__last - __first) / 2; > >>> _my_inplace_stable_sort(__first, __middle, __comp); > >>> _my_inplace_stable_sort(__middle, __last, __comp); > >>> _my_merge_without_buffer(__first, __middle, __last, > >>> __middle - __first, > >>> __last - __middle, > >>> __comp); > >>> } > >>> > >>> > >>> template<typename Iterator> > >>> void my_inplace_stable_sort(Iterator start, Iterator end) { > >>> _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > >>> } > >>> > >>> template<typename Iterator> > >>> void std_inplace_stable_sort(Iterator start, Iterator end) { > >>> std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); > >>> } > >>> > >>> void benchmark(string const& dist_name, > >>> vector<int> dist_func(int), > >>> int n) { > >>> vector<int> v = dist_func(n); > >>> cout << "Distribution: " << dist_name << "(" << n << ")" << endl; > >>> cout << "Example: " << dist_func(20) << endl; > >>> TestCase testCase(v); > >>> > >>> pair<Counter, Counter> counter_pair = > >>> testCase.benchmark( > >>> my_inplace_stable_sort > >>> , std_inplace_stable_sort > >>> ); > >>> Counter cKloet = counter_pair.first; > >>> Counter cInplace = counter_pair.second; > >>> > >>> relative_report(cKloet, cInplace); > >>> } > >>> > >>> > >>> vector<int> random_unique(int n) { > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i; > >>> } > >>> for (int i = n - 1; i >= 0; --i) { > >>> swap(v[i], v[rand() % (i + 1)]); > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> random_dupes(int n) { > >>> vector<int> v(n); > >>> int r = (int)sqrt(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i / r; > >>> } > >>> for (int i = n - 1; i >= 0; --i) { > >>> swap(v[i], v[rand() % (i + 1)]); > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> increasing_unique(int n) { > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector<int> increasing_dupes(int n) { > >>> vector<int> v(n); > >>> int r = (int)sqrt(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i / r; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector<int> mostly_increasing(int n) { > >>> vector<int> v(n); > >>> int r = (int)sqrt(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i + rand() % r; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector<int> decreasing_unique(int n) { > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = n - 1 - i; > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> decreasing_dupes(int n) { > >>> vector<int> v(n); > >>> int r = (int)sqrt(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = (n - 1 - i) / r; > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> mostly_decreasing(int n) { > >>> vector<int> v(n); > >>> int r = (int)sqrt(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = n - 1 - i + rand() % r; > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> sawtooth_unique(int n) { > >>> int r = (int)sqrt(n); > >>> while (gcd(r, n) != 1) ++r; > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[(i * r) % n] = i; > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> sawtooth_dupes(int n) { > >>> int r = (int)sqrt(n); > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i % r; > >>> } > >>> return v; > >>> } > >>> > >>> vector<int> increasing_with_swap(int n) { > >>> vector<int> v(n); > >>> for (int i = 0; i < n; ++i) { > >>> v[i] = i; > >>> } > >>> swap(v[0], v[n - 1]); > >>> return v; > >>> } > >>> > >>> vector<int> increasing_fast_slow(int n) { > >>> int r = (int)sqrt(n); > >>> vector<int> v(n); > >>> for (int i = 0; i < r; ++i) { > >>> v[i] = i * r; > >>> } > >>> for (int i = r; i < n; ++i) { > >>> v[i] = (i - r) * n / (n - r); > >>> } > >>> return v; > >>> } > >>> > >>> > >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > >>> > >>> int main() { > >>> int n = 100000; > >>> BENCHMARK(random_unique, n); > >>> BENCHMARK(random_dupes, n); > >>> BENCHMARK(increasing_unique, n); > >>> BENCHMARK(increasing_dupes, n); > >>> BENCHMARK(mostly_increasing, n); > >>> BENCHMARK(decreasing_unique, n); > >>> BENCHMARK(decreasing_dupes, n); > >>> BENCHMARK(mostly_decreasing, n); > >>> BENCHMARK(sawtooth_unique, n); > >>> BENCHMARK(sawtooth_dupes, n); > >>> BENCHMARK(increasing_with_swap, n); > >>> BENCHMARK(increasing_fast_slow, n); > >>> > >>> return 0; > >>> } > >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: __inplace_stable_sort improvement 2021-11-25 19:51 ` David de Kloet @ 2021-12-10 15:31 ` David de Kloet 0 siblings, 0 replies; 6+ messages in thread From: David de Kloet @ 2021-12-10 15:31 UTC (permalink / raw) To: François Dumont; +Cc: libstdc++ I had another look at this and decided to try `make check-performance` as you suggested, but I got make: *** No rule to make target 'check-performance'. Stop. Then I tried to run `configure` (from another directory) and got Building GCC requires GMP 4.2+, MPFR 3.1.0+ and MPC 0.8.0+. ... Since I have no idea what I'm doing, I thought I'd better check if I'm missing something obvious before trying to figure it out. On Thu, Nov 25, 2021 at 8:51 PM David de Kloet <dskloet@gmail.com> wrote: > On Thu, Nov 25, 2021 at 6:35 AM François Dumont <frs.dumont@gmail.com> > wrote: > > > > On 24/11/21 7:48 pm, David de Kloet wrote: > > > On Wed, Nov 24, 2021 at 7:33 PM François Dumont <frs.dumont@gmail.com> > wrote: > > >> > > >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > > >>> Hello libstdc++, > > >>> > > >>> Apologies if this is the wrong list for my message. I believe I have > an > > >>> improvement for __inplace_stable_sort and I'm wondering if this is > worth > > >>> pursuing and if so how to proceed. > > >> Of course, anyone interested in enhancing the library is welcome. > > >> > > >> The best way to do so would be to provide a patch on the current > library > > >> code. I'm the last one to have touch to this algo and it would be > easier > > >> for me to understand if you try to implement it directly on the > library. > > > _my_merge_without_buffer below preprocesses exactly to > > > __merge_without_buffer from the library if you undefine NEW_ALGO so > > > you can see exactly what my changes are. > > > > > > I currently don't have much time, but when I do I'd be happy to > > > provide it as a patch. If you find time before me, you're also more > > > than welcome to use the code. > > > > > >> The good news is that you are trying to enhance a piece of code rarely > > >> used as it is only when the buffer allocation fails. So it is less > > >> critical and so perhaps more open to modifications. > > >> > > >> During my last work on it I've added a performance test: > > >> performance/25_algorithms/inplace_merge.cc. It is also supposed to > check > > >> performance when we fail to get a buffer and so use > > >> __merge_without_buffer. Use 'make check-performance' to run it, see > > >> library manual guide. Do not hesitate to change it if you think that > > >> your benches are better. > > > I don't currently have the environment to `make check-performance`. I > > > just copied the code from the web and adjusted it. But (again) when I > > > have time, I'll look into that. > > > > > > If you know of any "contributing to stdlibc++" guides, that would be > great too. > > > > > https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html > > > > > > > > > >>> __inplace_stable_sort does a merge sort where in each recursive step > the 2 > > >>> sublists are merged in-place. The merge step is itself also > recursive. In order > > >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head > section S2 > > >>> of [B, C) are found such that all elements in S1 are greater than > all elements > > >>> in S2 and then those sections are swapped/rotated and the shorter > sections are > > >>> merged recursively. This is done by the function > __merge_without_buffer. > > >>> > > >>> In the current implementation, __merge_without_buffer takes the half > point of > > >>> the longer section and does a binary search (via lower_bound or > upper_bound) to > > >>> find the corresponding cut point on the other section. Then it > rotates the > > >>> sections. > > >>> > > >>> My proposal is to find the cut points such that both rotated > sections have the > > >>> same length such that they can always simply be swapped. > > >> What do you mean by swapped ? > > > I mean the first element of section 1 gets swapped with the first > > > element of section 2. The second element of section 1 gets swapped > > > with the second element of section 2. Etc. for each index. > > > > > >> I see in the code below that you are just > > >> changing how the cut positions are computed and then call the same > > >> std::rotate algo. I cannot see any reason why changing the position > > >> would make the rotate faster. > > > The implementation of rotate does a series of such section swaps. But > > > if both section have the same length, it's only a single swap of > > > sections which makes rotate about 1.5-2 times faster. > > > > Ok, I see now, you try to use as much as possible the std::swap_ranges > > in __rotate for random access iterator. Maybe it means that this > > optimization should be limited to this type of iterator. We'll see > > depending on benches. > > The binary search implementation is a bit awkward for the > bidi-iterator, so having a separate implementation for random access > iterators might improve the readability. However, if you look at my > benchmarks, it also does fewer comparisons than the current algorithm. > But rotate() doesn't do any comparisons. So there must be another > benefit to choosing the cut points this way, but I'm not entirely sure > why. > > > > > I'll have a try then when I have time. > > > > Thanks > > > > > > > > The purpose of the code I added is to choose the cut positions such > > > that the rotated sections are of equal length, which makes rotate > > > faster by doing fewer element swaps, resulting in fewer calls to the > > > copy/assignment constructor. > > > > > >>> I have implemented > > >>> this and it seems to perform better in general and in the special > cases that > > >>> I've tried. > > >>> > > >>> Unfortunately, since we are no longer guaranteed to halve the longer > section in > > >>> each step, there is a possibility for a stack overflow. To avoid > this, the new > > >>> algorithm falls back to the old behavior whenever the sections have > > >>> significantly different lengths. > > >>> > > >>> Also, where the current algorithm performs the binary search > implicitly via > > >>> lower_bound/upper_bound, the new algorithm has to implement the > binary search > > >>> explicitly because it has to look at both sections to determine the > next step > > >>> in the search. > > >>> > > >>> I have created wrappers for my value objects as well as my iterators > to count > > >>> the number of operations performed by the current and my algorithm > and run > > >>> benchmarks on various distributions. My code is at the bottom but > below is the > > >>> output as run on my machine. For each operation it outputs the ratio > between > > >>> the number of operations performed by my algorithm over the number > performed by > > >>> the current algorithm. > > >>> > > >>> Thanks for your attention, > > >>> David > > >>> > > >>> > > >>> Distribution: random_unique(100000) > > >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, > 8, 12, 3, 2] > > >>> > > >>> move: 0.588026 = 3331848 / 5666154 > > >>> moveAssign: 0.594789 = 6852819 / 11521431 > > >>> compare: 0.914997 = 2028977 / 2217469 > > >>> iteratorDeref: 0.718466 = 11001863 / 15312994 > > >>> iteratorCopy: 0.791332 = 33977189 / 42936712 > > >>> iteratorEquals: 2.27841 = 5174606 / 2271143 > > >>> iteratorInc: 1.2143 = 7697971 / 6339449 > > >>> iteratorDec: 0.0833529 = 473556 / 5681340 > > >>> iteratorDiff: 0.444865 = 1362598 / 3062950 > > >>> iteratorSub: 0.0200683 = 6272 / 312533 > > >>> iteratorAdd: 0.0695467 = 57149 / 821736 > > >>> iteratorAddAssign: 1.36491 = 3141949 / 2301950 > > >>> iteratorArithmetic: 0.861623 = 17914101 / 20791101 > > >>> micros: 0.663704 = 650825 / 980596 > > >>> > > >>> Distribution: random_dupes(100000) > > >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] > > >>> > > >>> move: 0.570041 = 2949623 / 5174401 > > >>> moveAssign: 0.577736 = 6087819 / 10537375 > > >>> compare: 0.845419 = 1530877 / 1810790 > > >>> iteratorDeref: 0.678233 = 9232869 / 13613114 > > >>> iteratorCopy: 0.742699 = 25652244 / 34539225 > > >>> iteratorEquals: 2.41643 = 4123881 / 1706601 > > >>> iteratorInc: 1.15617 = 6554374 / 5669028 > > >>> iteratorDec: 0.101004 = 539991 / 5346219 > > >>> iteratorDiff: 0.417093 = 959972 / 2301580 > > >>> iteratorSub: 0.0350046 = 8882 / 253738 > > >>> iteratorAdd: 0.0972571 = 64122 / 659304 > > >>> iteratorAddAssign: 1.2279 = 2247197 / 1830107 > > >>> iteratorArithmetic: 0.81605 = 14498419 / 17766577 > > >>> micros: 0.699386 = 554426 / 792732 > > >>> > > >>> Distribution: increasing_unique(100000) > > >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > 17, 18, 19] > > >>> > > >>> move: 1 = 91808 / 91808 > > >>> moveAssign: 1 = 91808 / 91808 > > >>> compare: 0.573591 = 224558 / 391495 > > >>> iteratorDeref: 0.722385 = 540924 / 748803 > > >>> iteratorCopy: 0.298175 = 976735 / 3275708 > > >>> iteratorEquals: 0.522302 = 116383 / 222827 > > >>> iteratorInc: 0.715444 = 132750 / 185549 > > >>> iteratorDec: 1 = 91808 / 91808 > > >>> iteratorDiff: 0.217455 = 40956 / 188342 > > >>> iteratorAdd: 1 = 16383 / 16383 > > >>> iteratorAddAssign: 0.299523 = 81884 / 273381 > > >>> iteratorArithmetic: 0.49082 = 480164 / 978290 > > >>> micros: 0.479792 = 22935 / 47802 > > >>> > > >>> Distribution: increasing_dupes(100000) > > >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] > > >>> > > >>> move: 1 = 91808 / 91808 > > >>> moveAssign: 1 = 91808 / 91808 > > >>> compare: 0.573591 = 224558 / 391495 > > >>> iteratorDeref: 0.722385 = 540924 / 748803 > > >>> iteratorCopy: 0.298175 = 976735 / 3275708 > > >>> iteratorEquals: 0.522302 = 116383 / 222827 > > >>> iteratorInc: 0.715444 = 132750 / 185549 > > >>> iteratorDec: 1 = 91808 / 91808 > > >>> iteratorDiff: 0.217455 = 40956 / 188342 > > >>> iteratorAdd: 1 = 16383 / 16383 > > >>> iteratorAddAssign: 0.299523 = 81884 / 273381 > > >>> iteratorArithmetic: 0.49082 = 480164 / 978290 > > >>> micros: 0.449154 = 20675 / 46031 > > >>> > > >>> Distribution: mostly_increasing(100000) > > >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, > 18, 21, 20] > > >>> > > >>> move: 0.681093 = 814725 / 1196203 > > >>> moveAssign: 0.703421 = 1809563 / 2572519 > > >>> compare: 0.924016 = 965320 / 1044700 > > >>> iteratorDeref: 0.847765 = 3802634 / 4485480 > > >>> iteratorCopy: 0.804068 = 11859111 / 14748894 > > >>> iteratorEquals: 1.59826 = 1388752 / 868913 > > >>> iteratorInc: 1.25659 = 1818421 / 1447111 > > >>> iteratorDec: 0.383738 = 537108 / 1399674 > > >>> iteratorDiff: 0.486488 = 562273 / 1155779 > > >>> iteratorSub: 0.0638659 = 6666 / 104375 > > >>> iteratorAdd: 0.184194 = 55586 / 301780 > > >>> iteratorAddAssign: 1.30369 = 1105439 / 847932 > > >>> iteratorArithmetic: 0.893672 = 5474245 / 6125564 > > >>> micros: 0.790603 = 233953 / 295917 > > >>> > > >>> Distribution: decreasing_unique(100000) > > >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, > 3, 2, 1, 0] > > >>> > > >>> move: 0.40597 = 995312 / 2451688 > > >>> moveAssign: 0.45785 = 2459840 / 5372592 > > >>> compare: 0.489247 = 151631 / 309927 > > >>> iteratorDeref: 0.524798 = 3415934 / 6509043 > > >>> iteratorCopy: 0.563628 = 5971275 / 10594360 > > >>> iteratorEquals: 1.18395 = 843533 / 712475 > > >>> iteratorInc: 0.864454 = 1460463 / 1689462 > > >>> iteratorDec: 0.373125 = 1606976 / 4306800 > > >>> iteratorDiff: 0.495226 = 235930 / 476409 > > >>> iteratorSub: 0.248254 = 13824 / 55685 > > >>> iteratorAdd: 0.590877 = 135839 / 229894 > > >>> iteratorAddAssign: 0.421852 = 119646 / 283621 > > >>> iteratorArithmetic: 0.569514 = 4416211 / 7754346 > > >>> micros: 0.543145 = 167865 / 309061 > > >>> > > >>> Distribution: decreasing_dupes(100000) > > >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] > > >>> > > >>> move: 0.741695 = 1455276 / 1962096 > > >>> moveAssign: 0.736048 = 2826598 / 3840238 > > >>> compare: 0.641679 = 282851 / 440798 > > >>> iteratorDeref: 0.740565 = 3386062 / 4572271 > > >>> iteratorCopy: 0.628798 = 5008472 / 7965150 > > >>> iteratorEquals: 1.50206 = 794354 / 528843 > > >>> iteratorInc: 1.39602 = 2164546 / 1550510 > > >>> iteratorDec: 0.336639 = 837728 / 2488504 > > >>> iteratorDiff: 0.313457 = 88552 / 282501 > > >>> iteratorSub: 0.165388 = 3880 / 23460 > > >>> iteratorAdd: 0.377644 = 27139 / 71864 > > >>> iteratorAddAssign: 0.551364 = 182383 / 330785 > > >>> iteratorArithmetic: 0.776766 = 4098582 / 5276467 > > >>> micros: 0.71282 = 160398 / 225019 > > >>> > > >>> Distribution: mostly_decreasing(100000) > > >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, > > >>> 6, 4, 3, 2] > > >>> > > >>> move: 0.760602 = 2197435 / 2889075 > > >>> moveAssign: 0.768473 = 4591313 / 5974593 > > >>> compare: 0.932913 = 977678 / 1047984 > > >>> iteratorDeref: 0.838641 = 6633169 / 7909424 > > >>> iteratorCopy: 0.829028 = 15447960 / 18633830 > > >>> iteratorEquals: 1.7009 = 2000686 / 1176251 > > >>> iteratorInc: 1.52771 = 3788203 / 2479657 > > >>> iteratorDec: 0.363843 = 1369990 / 3765332 > > >>> iteratorDiff: 0.489186 = 595472 / 1217270 > > >>> iteratorSub: 0.0877827 = 10925 / 124455 > > >>> iteratorAdd: 0.201991 = 69640 / 344767 > > >>> iteratorAddAssign: 1.32756 = 1133512 / 853830 > > >>> iteratorArithmetic: 0.900303 = 8968428 / 9961562 > > >>> micros: 0.846298 = 377504 / 446065 > > >>> > > >>> Distribution: sawtooth_unique(100000) > > >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, > 11, 14, 17] > > >>> > > >>> move: 0.569953 = 2912465 / 5110007 > > >>> moveAssign: 0.577701 = 6012427 / 10407511 > > >>> compare: 0.824488 = 1604387 / 1945920 > > >>> iteratorDeref: 0.678979 = 9299794 / 13696724 > > >>> iteratorCopy: 0.72546 = 26157352 / 36056250 > > >>> iteratorEquals: 2.00683 = 4131766 / 2058856 > > >>> iteratorInc: 1.10728 = 6498257 / 5868688 > > >>> iteratorDec: 0.111225 = 552519 / 4967557 > > >>> iteratorDiff: 0.405328 = 988144 / 2437889 > > >>> iteratorSub: 0.026249 = 6327 / 241038 > > >>> iteratorAdd: 0.089034 = 55918 / 628052 > > >>> iteratorAddAssign: 1.18737 = 2297233 / 1934717 > > >>> iteratorArithmetic: 0.801143 = 14530164 / 18136797 > > >>> micros: 0.706282 = 583937 / 826776 > > >>> > > >>> Distribution: sawtooth_dupes(100000) > > >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] > > >>> > > >>> move: 0.547721 = 2348873 / 4288447 > > >>> moveAssign: 0.543252 = 4613817 / 8492965 > > >>> compare: 0.664642 = 841503 / 1266099 > > >>> iteratorDeref: 0.599656 = 6292282 / 10493159 > > >>> iteratorCopy: 0.60663 = 15336816 / 25281976 > > >>> iteratorEquals: 2.24803 = 2859760 / 1272118 > > >>> iteratorInc: 1.07142 = 4999546 / 4666276 > > >>> iteratorDec: 0.0490551 = 205953 / 4198405 > > >>> iteratorDiff: 0.314518 = 473157 / 1504390 > > >>> iteratorSub: 0.0371951 = 5998 / 161258 > > >>> iteratorAdd: 0.0875684 = 34672 / 395942 > > >>> iteratorAddAssign: 0.900009 = 1233077 / 1370072 > > >>> iteratorArithmetic: 0.72316 = 9812163 / 13568461 > > >>> micros: 0.610891 = 353102 / 578011 > > >>> > > >>> Distribution: increasing_with_swap(100000) > > >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > 17, 18, 0] > > >>> > > >>> move: 1 = 291782 / 291782 > > >>> moveAssign: 1 = 491779 / 491779 > > >>> compare: 0.57449 = 225064 / 391763 > > >>> iteratorDeref: 0.819553 = 941922 / 1149312 > > >>> iteratorCopy: 0.377964 = 1392386 / 3683915 > > >>> iteratorEquals: 0.525758 = 117408 / 223312 > > >>> iteratorInc: 0.863708 = 333050 / 385605 > > >>> iteratorDec: 0.999753 = 291716 / 291788 > > >>> iteratorDiff: 0.224408 = 42528 / 189512 > > >>> iteratorSub: 0.888889 = 104 / 117 > > >>> iteratorAdd: 0.997151 = 16801 / 16849 > > >>> iteratorAddAssign: 0.302604 = 82864 / 273836 > > >>> iteratorArithmetic: 0.640448 = 884471 / 1381019 > > >>> micros: 0.616124 = 36233 / 58808 > > >>> > > >>> Distribution: increasing_fast_slow(100000) > > >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, > 16, 17, 18] > > >>> > > >>> move: 0.640348 = 825772 / 1289568 > > >>> moveAssign: 0.627081 = 1559792 / 2487384 > > >>> compare: 0.714235 = 431286 / 603843 > > >>> iteratorDeref: 0.694913 = 2421875 / 3485151 > > >>> iteratorCopy: 0.637176 = 5645877 / 8860786 > > >>> iteratorEquals: 1.7552 = 953880 / 543458 > > >>> iteratorInc: 1.08941 = 1676609 / 1539005 > > >>> iteratorDec: 0.135904 = 154533 / 1137073 > > >>> iteratorDiff: 0.388132 = 192787 / 496705 > > >>> iteratorSub: 0.00577964 = 141 / 24396 > > >>> iteratorAdd: 0.207287 = 17272 / 83324 > > >>> iteratorAddAssign: 0.849015 = 431320 / 508024 > > >>> iteratorArithmetic: 0.790987 = 3426542 / 4331985 > > >>> micros: 0.777699 = 153552 / 197444 > > >>> > > >>> > > >>> > > >>> #include <iostream> > > >>> #include <stdio.h> > > >>> #include <vector> > > >>> #include <cstdlib> > > >>> #include <algorithm> > > >>> #include <functional> > > >>> #include <ranges> > > >>> #include <chrono> > > >>> #include <cmath> > > >>> #include <numeric> > > >>> > > >>> using namespace std; > > >>> > > >>> #define LINE0 0 > > >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > > >>> #define vout(v) #v << " = " << (v) > > >>> #define print1(v) cout_line << vout(v) << endl > > >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << > endl > > >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) > > >>> << ", " << vout(v3) << endl > > >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME > > >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, > print1)(__VA_ARGS__) > > >>> > > >>> template <class Ch, class Tr, class Container> > > >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > > >>> Container const& x); > > >>> > > >>> template <class X, class Y> > > >>> ostream & operator << (ostream & os, pair <X, Y> const& p) { > > >>> return os << p.first << ":" << p.second; > > >>> } > > >>> > > >>> template <class Ch, class Tr, class Container> > > >>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os, > > >>> Container const& x) { > > >>> os << "["; > > >>> bool first = true; > > >>> for(auto const& y : x) { > > >>> if (!first) os << ", "; > > >>> os << y; > > >>> first = false; > > >>> } > > >>> return os << "]"; > > >>> } > > >>> > > >>> > > >>> struct Counter { > > >>> int construct = 0; > > >>> int copy = 0; > > >>> int assign = 0; > > >>> int move = 0; > > >>> int moveAssign = 0; > > >>> int compare = 0; > > >>> int iteratorDeref = 0; > > >>> int iteratorCopy = 0; > > >>> int iteratorEquals = 0; > > >>> int iteratorInc = 0; > > >>> int iteratorDec = 0; > > >>> int iteratorDiff = 0; > > >>> int iteratorSub = 0; > > >>> int iteratorAdd = 0; > > >>> int iteratorAddAssign = 0; > > >>> int iteratorArithmetic = 0; > > >>> int micros = 0; > > >>> }; > > >>> > > >>> int next_element_id = 0; > > >>> > > >>> class Element { > > >>> Counter* counter; > > >>> public: > > >>> int val; > > >>> int id; // Distinguis identical values to verify stability. > > >>> Element(int val, Counter* counter) > > >>> : val(val), counter(counter), id(++next_element_id) { > > >>> ++counter->construct; > > >>> } > > >>> > > >>> Element(Element const& other) > > >>> : val(other.val), counter(other.counter), id(other.id) { > > >>> ++counter->copy; > > >>> } > > >>> > > >>> Element& operator=(Element const& other) { > > >>> val = other.val; > > >>> counter = other.counter; > > >>> id = other.id; > > >>> ++counter->assign; > > >>> return *this; > > >>> } > > >>> > > >>> Element(Element&& other) noexcept > > >>> : val(move(other.val)), counter(other.counter), id(other.id) > { > > >>> ++counter->move; > > >>> } > > >>> > > >>> Element& operator=(Element&& other) noexcept { > > >>> val = move(other.val); > > >>> counter = other.counter; > > >>> id = other.id; > > >>> ++counter->moveAssign; > > >>> return *this; > > >>> } > > >>> > > >>> bool operator<(Element const& other) const { > > >>> ++counter->compare; > > >>> return val < other.val; > > >>> } > > >>> > > >>> bool operator==(Element const& other) const { > > >>> return val == other.val && id == other.id; > > >>> } > > >>> }; > > >>> > > >>> ostream & operator << (ostream & os, Element const& e) { > > >>> return os << e.val << "(" << e.id << ")"; > > >>> } > > >>> > > >>> > > >>> class RandomIterator { > > >>> typedef vector<Element>::iterator It; > > >>> It it; > > >>> Counter* counter; > > >>> > > >>> public: > > >>> using iterator_category = std::random_access_iterator_tag; > > >>> using difference_type = std::ptrdiff_t; > > >>> using value_type = Element; > > >>> using pointer = value_type*; > > >>> using reference = value_type&; > > >>> > > >>> RandomIterator(It it, Counter* counter) : it(it), > counter(counter) { > > >>> } > > >>> > > >>> RandomIterator(RandomIterator const& other) : it(other.it), > > >>> counter(other.counter) { > > >>> counter->iteratorCopy += 1; > > >>> } > > >>> > > >>> bool operator== (RandomIterator other) { > > >>> counter->iteratorEquals += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return it == other.it; > > >>> } > > >>> > > >>> RandomIterator operator++ () { > > >>> counter->iteratorInc += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return RandomIterator(++it, counter); > > >>> } > > >>> > > >>> RandomIterator operator-- () { > > >>> counter->iteratorDec += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return RandomIterator(--it, counter); > > >>> } > > >>> > > >>> Element& operator* () { > > >>> counter->iteratorDeref += 1; > > >>> return *it; > > >>> } > > >>> > > >>> difference_type operator- (RandomIterator other) { > > >>> counter->iteratorDiff += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return it - other.it; > > >>> } > > >>> > > >>> RandomIterator operator- (difference_type diff) { > > >>> counter->iteratorSub += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return RandomIterator(it - diff, counter); > > >>> } > > >>> > > >>> RandomIterator operator+ (difference_type diff) { > > >>> counter->iteratorAdd += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return RandomIterator(it + diff, counter); > > >>> } > > >>> > > >>> RandomIterator operator+= (difference_type diff) { > > >>> counter->iteratorAddAssign += 1; > > >>> counter->iteratorArithmetic += 1; > > >>> return RandomIterator(it += diff, counter); > > >>> } > > >>> }; > > >>> > > >>> > > >>> class TestCase { > > >>> Counter counter; > > >>> vector<Element> elements; > > >>> > > >>> public: > > >>> TestCase(vector<int> const& data) { > > >>> elements.reserve(data.size()); > > >>> for (int i: data) { > > >>> elements.push_back(Element(i, &counter)); > > >>> } > > >>> } > > >>> > > >>> Counter sort( > > >>> void algo(RandomIterator, RandomIterator), > > >>> vector<Element>::iterator begin, > > >>> vector<Element>::iterator end) { > > >>> counter = Counter(); > > >>> auto t0 = std::chrono::system_clock::now(); > > >>> > > >>> algo(RandomIterator(begin, &counter), RandomIterator(end, > &counter)); > > >>> > > >>> auto t1 = std::chrono::system_clock::now(); > > >>> counter.micros = > > >>> std::chrono::duration_cast<std::chrono::microseconds>(t1 - > > >>> t0).count(); > > >>> return counter; > > >>> } > > >>> > > >>> pair<Counter,Counter> benchmark( > > >>> void new_algo(RandomIterator, RandomIterator), > > >>> void cur_algo(RandomIterator, RandomIterator)) { > > >>> vector<Element> toSort1 = elements; > > >>> Counter counter1 = sort(new_algo, toSort1.begin(), > toSort1.end()); > > >>> > > >>> vector<Element> toSort2 = elements; > > >>> Counter counter2 = sort(cur_algo, toSort2.begin(), > toSort2.end()); > > >>> > > >>> if (toSort1 != toSort2) { > > >>> cout << "INCORRECT RESULT!" << endl; > > >>> exit(1); > > >>> } > > >>> > > >>> return make_pair(counter1, counter2); > > >>> } > > >>> }; > > >>> > > >>> > > >>> void relative_metric(string name, int m1, int m2) { > > >>> if (m1 == 0 && m2 == 0) return; > > >>> if (m2 == 0) { > > >>> cout << name << ": inf = " << m1 << " / " << m2 << endl; > > >>> } else { > > >>> cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / > " > > >>> << m2 << endl; > > >>> } > > >>> } > > >>> > > >>> > > >>> void relative_report(Counter const& c1, Counter const& c2) { > > >>> cout << endl; > > >>> relative_metric("construct", c1.construct, c2.construct); > > >>> relative_metric("copy", c1.copy, c2.copy); > > >>> relative_metric("assign", c1.assign, c2.assign); > > >>> relative_metric("move", c1.move, c2.move); > > >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > > >>> relative_metric("compare", c1.compare, c2.compare); > > >>> relative_metric("iteratorDeref", c1.iteratorDeref, > c2.iteratorDeref); > > >>> relative_metric("iteratorCopy", c1.iteratorCopy, > c2.iteratorCopy); > > >>> relative_metric("iteratorEquals", c1.iteratorEquals, > c2.iteratorEquals); > > >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > > >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > > >>> relative_metric("iteratorDiff", c1.iteratorDiff, > c2.iteratorDiff); > > >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > > >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > > >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > > >>> c2.iteratorAddAssign); > > >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > > >>> c2.iteratorArithmetic); > > >>> relative_metric("micros", c1.micros, c2.micros); > > >>> cout << endl; > > >>> } > > >>> > > >>> > > >>> /// This is a helper function for the merge routines. > > >>> template<typename _BidirectionalIterator, typename _Distance, > typename _Compare> > > >>> void _my_merge_without_buffer( > > >>> _BidirectionalIterator __first, > > >>> _BidirectionalIterator __middle, > > >>> _BidirectionalIterator __last, > > >>> _Distance __len1, _Distance __len2, > > >>> _Compare __comp) { > > >>> > > >>> if (__len1 == 0 || __len2 == 0) > > >>> return; > > >>> > > >>> if (__len1 + __len2 == 2) { > > >>> if (__comp(__middle, __first)) > > >>> std::iter_swap(__first, __middle); > > >>> return; > > >>> } > > >>> > > >>> _BidirectionalIterator __first_cut = __first; > > >>> _BidirectionalIterator __second_cut = __middle; > > >>> _Distance __len11 = 0; > > >>> _Distance __len22 = 0; > > >>> > > >>> > > >>> #define NEW_ALGO 1 > > >>> > > >>> #ifdef NEW_ALGO > > >>> // Prevent stack overflow > > >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > > >>> #endif > > >>> if (__len1 > __len2) > > >>> { > > >>> __len11 = __len1 / 2; > > >>> std::advance(__first_cut, __len11); > > >>> __second_cut > > >>> = std::__lower_bound(__middle, __last, *__first_cut, > > >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); > > >>> __len22 = std::distance(__middle, __second_cut); > > >>> } > > >>> else > > >>> { > > >>> __len22 = __len2 / 2; > > >>> std::advance(__second_cut, __len22); > > >>> __first_cut > > >>> = std::__upper_bound(__first, __middle, *__second_cut, > > >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); > > >>> __len11 = std::distance(__first, __first_cut); > > >>> } > > >>> #ifdef NEW_ALGO > > >>> } else > > >>> { > > >>> _Distance __cut_range; > > >>> // __cut_len is always equal to __middle - __first_cut (and > eventually to > > >>> // __second_cut - __middle) but we keep track of it to avoid > calculating it > > >>> // later which can be expensive for bidirectional iterators. > > >>> _Distance __cut_len; > > >>> > > >>> if (__len1 > __len2) > > >>> { > > >>> std::advance(__first_cut, __len1 - __len2); > > >>> __cut_range = __len2; > > >>> } > > >>> else > > >>> { > > >>> __cut_range = __len1; > > >>> } > > >>> __cut_len = __cut_range; > > >>> > > >>> while (__cut_range > 0) { > > >>> _Distance __half_cut_range = __cut_range / 2; > > >>> _BidirectionalIterator __mid_first_cut = __first_cut; > > >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > > >>> _BidirectionalIterator __mid_second_cut = __second_cut; > > >>> std::advance(__mid_second_cut, __half_cut_range); > > >>> > > >>> if (__comp(__mid_second_cut, __mid_first_cut)) { > > >>> __cut_range -= __half_cut_range + 1; > > >>> __second_cut = __mid_second_cut; > > >>> ++__second_cut; > > >>> } else { > > >>> __cut_len -= __cut_range - __half_cut_range; > > >>> __cut_range = __half_cut_range; > > >>> __first_cut = __mid_first_cut; > > >>> ++__first_cut; > > >>> } > > >>> } > > >>> __len22 = __cut_len; > > >>> __len11 = __len1 - __cut_len; > > >>> } > > >>> #endif > > >>> > > >>> _BidirectionalIterator __new_middle > > >>> = std::rotate(__first_cut, __middle, __second_cut); > > >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, > > >>> __len11, __len22, __comp); > > >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, > > >>> __len1 - __len11, __len2 - __len22, __comp); > > >>> } > > >>> > > >>> /// This is a helper function for the stable sorting routines. > > >>> template<typename _RandomAccessIterator, typename _Compare> > > >>> void > > >>> _my_inplace_stable_sort(_RandomAccessIterator __first, > > >>> _RandomAccessIterator __last, _Compare __comp) > > >>> { > > >>> > > >>> if (__last - __first < 15) > > >>> { > > >>> std::__insertion_sort(__first, __last, __comp); > > >>> return; > > >>> } > > >>> _RandomAccessIterator __middle = __first + (__last - __first) > / 2; > > >>> _my_inplace_stable_sort(__first, __middle, __comp); > > >>> _my_inplace_stable_sort(__middle, __last, __comp); > > >>> _my_merge_without_buffer(__first, __middle, __last, > > >>> __middle - __first, > > >>> __last - __middle, > > >>> __comp); > > >>> } > > >>> > > >>> > > >>> template<typename Iterator> > > >>> void my_inplace_stable_sort(Iterator start, Iterator end) { > > >>> _my_inplace_stable_sort(start, end, > __gnu_cxx::__ops::__iter_less_iter()); > > >>> } > > >>> > > >>> template<typename Iterator> > > >>> void std_inplace_stable_sort(Iterator start, Iterator end) { > > >>> std::__inplace_stable_sort(start, end, > __gnu_cxx::__ops::__iter_less_iter()); > > >>> } > > >>> > > >>> void benchmark(string const& dist_name, > > >>> vector<int> dist_func(int), > > >>> int n) { > > >>> vector<int> v = dist_func(n); > > >>> cout << "Distribution: " << dist_name << "(" << n << ")" << endl; > > >>> cout << "Example: " << dist_func(20) << endl; > > >>> TestCase testCase(v); > > >>> > > >>> pair<Counter, Counter> counter_pair = > > >>> testCase.benchmark( > > >>> my_inplace_stable_sort > > >>> , std_inplace_stable_sort > > >>> ); > > >>> Counter cKloet = counter_pair.first; > > >>> Counter cInplace = counter_pair.second; > > >>> > > >>> relative_report(cKloet, cInplace); > > >>> } > > >>> > > >>> > > >>> vector<int> random_unique(int n) { > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i; > > >>> } > > >>> for (int i = n - 1; i >= 0; --i) { > > >>> swap(v[i], v[rand() % (i + 1)]); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> random_dupes(int n) { > > >>> vector<int> v(n); > > >>> int r = (int)sqrt(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i / r; > > >>> } > > >>> for (int i = n - 1; i >= 0; --i) { > > >>> swap(v[i], v[rand() % (i + 1)]); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> increasing_unique(int n) { > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector<int> increasing_dupes(int n) { > > >>> vector<int> v(n); > > >>> int r = (int)sqrt(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i / r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector<int> mostly_increasing(int n) { > > >>> vector<int> v(n); > > >>> int r = (int)sqrt(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i + rand() % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector<int> decreasing_unique(int n) { > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = n - 1 - i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> decreasing_dupes(int n) { > > >>> vector<int> v(n); > > >>> int r = (int)sqrt(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = (n - 1 - i) / r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> mostly_decreasing(int n) { > > >>> vector<int> v(n); > > >>> int r = (int)sqrt(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = n - 1 - i + rand() % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> sawtooth_unique(int n) { > > >>> int r = (int)sqrt(n); > > >>> while (gcd(r, n) != 1) ++r; > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[(i * r) % n] = i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> sawtooth_dupes(int n) { > > >>> int r = (int)sqrt(n); > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector<int> increasing_with_swap(int n) { > > >>> vector<int> v(n); > > >>> for (int i = 0; i < n; ++i) { > > >>> v[i] = i; > > >>> } > > >>> swap(v[0], v[n - 1]); > > >>> return v; > > >>> } > > >>> > > >>> vector<int> increasing_fast_slow(int n) { > > >>> int r = (int)sqrt(n); > > >>> vector<int> v(n); > > >>> for (int i = 0; i < r; ++i) { > > >>> v[i] = i * r; > > >>> } > > >>> for (int i = r; i < n; ++i) { > > >>> v[i] = (i - r) * n / (n - r); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > > >>> > > >>> int main() { > > >>> int n = 100000; > > >>> BENCHMARK(random_unique, n); > > >>> BENCHMARK(random_dupes, n); > > >>> BENCHMARK(increasing_unique, n); > > >>> BENCHMARK(increasing_dupes, n); > > >>> BENCHMARK(mostly_increasing, n); > > >>> BENCHMARK(decreasing_unique, n); > > >>> BENCHMARK(decreasing_dupes, n); > > >>> BENCHMARK(mostly_decreasing, n); > > >>> BENCHMARK(sawtooth_unique, n); > > >>> BENCHMARK(sawtooth_dupes, n); > > >>> BENCHMARK(increasing_with_swap, n); > > >>> BENCHMARK(increasing_fast_slow, n); > > >>> > > >>> return 0; > > >>> } > > >> > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-12-10 15:31 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-18 12:58 __inplace_stable_sort improvement David de Kloet 2021-11-24 18:33 ` François Dumont 2021-11-24 18:48 ` David de Kloet 2021-11-25 5:35 ` François Dumont 2021-11-25 19:51 ` David de Kloet 2021-12-10 15:31 ` David de Kloet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).