public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: "François Dumont" <frs.dumont@gmail.com>
To: David de Kloet <dskloet@gmail.com>
Cc: libstdc++@gcc.gnu.org
Subject: Re: __inplace_stable_sort improvement
Date: Thu, 25 Nov 2021 06:35:05 +0100	[thread overview]
Message-ID: <54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com> (raw)
In-Reply-To: <CAKxRKif8eweyvf5ugZ-qVFwtLNqHCOBZRwZg87gwXt9ivytc_w@mail.gmail.com>

On 24/11/21 7:48 pm, David de Kloet wrote:
> On Wed, Nov 24, 2021 at 7:33 PM François Dumont <frs.dumont@gmail.com> wrote:
>>
>> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote:
>>> Hello libstdc++,
>>>
>>> Apologies if this is the wrong list for my message. I believe I have an
>>> improvement for __inplace_stable_sort and I'm wondering if this is worth
>>> pursuing and if so how to proceed.
>> Of course, anyone interested in enhancing the library is welcome.
>>
>> The best way to do so would be to provide a patch on the current library
>> code. I'm the last one to have touch to this algo and it would be easier
>> for me to understand if you try to implement it directly on the library.
> _my_merge_without_buffer below preprocesses exactly to
> __merge_without_buffer from the library if you undefine NEW_ALGO so
> you can see exactly what my changes are.
>
> I currently don't have much time, but when I do I'd be happy to
> provide it as a patch. If you find time before me, you're also more
> than welcome to use the code.
>
>> The good news is that you are trying to enhance a piece of code rarely
>> used as it is only when the buffer allocation fails. So it is less
>> critical and so perhaps more open to modifications.
>>
>> During my last work on it I've added a performance test:
>> performance/25_algorithms/inplace_merge.cc. It is also supposed to check
>> performance when we fail to get a buffer and so use
>> __merge_without_buffer. Use 'make check-performance' to run it, see
>> library manual guide. Do not hesitate to change it if you think that
>> your benches are better.
> I don't currently have the environment to `make check-performance`. I
> just copied the code from the web and adjusted it. But (again) when I
> have time, I'll look into that.
>
> If you know of any "contributing to stdlibc++" guides, that would be great too.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html


>
>>> __inplace_stable_sort does a merge sort where in each recursive step the 2
>>> sublists are merged in-place. The merge step is itself also recursive. In order
>>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2
>>> of [B, C) are found such that all elements in S1 are greater than all elements
>>> in S2 and then those sections are swapped/rotated and the shorter sections are
>>> merged recursively. This is done by the function __merge_without_buffer.
>>>
>>> In the current implementation, __merge_without_buffer takes the half point of
>>> the longer section and does a binary search (via lower_bound or upper_bound) to
>>> find the corresponding cut point on the other section. Then it rotates the
>>> sections.
>>>
>>> My proposal is to find the cut points such that both rotated sections have the
>>> same length such that they can always simply be swapped.
>> What do you mean by swapped ?
> I mean the first element of section 1 gets swapped with the first
> element of section 2. The second element of section 1 gets swapped
> with the second element of section 2. Etc. for each index.
>
>> I see in the code below that you are just
>> changing how the cut positions are computed and then call the same
>> std::rotate algo. I cannot see any reason why changing the position
>> would make the rotate faster.
> The implementation of rotate does a series of such section swaps. But
> if both section have the same length, it's only a single swap of
> sections which makes rotate about 1.5-2 times faster.

Ok, I see now, you try to use as much as possible the std::swap_ranges 
in __rotate for random access iterator. Maybe it means that this 
optimization should be limited to this type of iterator. We'll see 
depending on benches.

I'll have a try then when I have time.

Thanks

>
> The purpose of the code I added is to choose the cut positions such
> that the rotated sections are of equal length, which makes rotate
> faster by doing fewer element swaps, resulting in fewer calls to the
> copy/assignment constructor.
>
>>>    I have implemented
>>> this and it seems to perform better in general and in the special cases that
>>> I've tried.
>>>
>>> Unfortunately, since we are no longer guaranteed to halve the longer section in
>>> each step, there is a possibility for a stack overflow. To avoid this, the new
>>> algorithm falls back to the old behavior whenever the sections have
>>> significantly different lengths.
>>>
>>> Also, where the current algorithm performs the binary search implicitly via
>>> lower_bound/upper_bound, the new algorithm has to implement the binary search
>>> explicitly because it has to look at both sections to determine the next step
>>> in the search.
>>>
>>> I have created wrappers for my value objects as well as my iterators to count
>>> the number of operations performed by the current and my algorithm and run
>>> benchmarks on various distributions. My code is at the bottom but below is the
>>> output as run on my machine. For each operation it outputs the ratio between
>>> the number of operations performed by my algorithm over the number performed by
>>> the current algorithm.
>>>
>>> Thanks for your attention,
>>> David
>>>
>>>
>>> Distribution: random_unique(100000)
>>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2]
>>>
>>> move: 0.588026 = 3331848 / 5666154
>>> moveAssign: 0.594789 = 6852819 / 11521431
>>> compare: 0.914997 = 2028977 / 2217469
>>> iteratorDeref: 0.718466 = 11001863 / 15312994
>>> iteratorCopy: 0.791332 = 33977189 / 42936712
>>> iteratorEquals: 2.27841 = 5174606 / 2271143
>>> iteratorInc: 1.2143 = 7697971 / 6339449
>>> iteratorDec: 0.0833529 = 473556 / 5681340
>>> iteratorDiff: 0.444865 = 1362598 / 3062950
>>> iteratorSub: 0.0200683 = 6272 / 312533
>>> iteratorAdd: 0.0695467 = 57149 / 821736
>>> iteratorAddAssign: 1.36491 = 3141949 / 2301950
>>> iteratorArithmetic: 0.861623 = 17914101 / 20791101
>>> micros: 0.663704 = 650825 / 980596
>>>
>>> Distribution: random_dupes(100000)
>>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1]
>>>
>>> move: 0.570041 = 2949623 / 5174401
>>> moveAssign: 0.577736 = 6087819 / 10537375
>>> compare: 0.845419 = 1530877 / 1810790
>>> iteratorDeref: 0.678233 = 9232869 / 13613114
>>> iteratorCopy: 0.742699 = 25652244 / 34539225
>>> iteratorEquals: 2.41643 = 4123881 / 1706601
>>> iteratorInc: 1.15617 = 6554374 / 5669028
>>> iteratorDec: 0.101004 = 539991 / 5346219
>>> iteratorDiff: 0.417093 = 959972 / 2301580
>>> iteratorSub: 0.0350046 = 8882 / 253738
>>> iteratorAdd: 0.0972571 = 64122 / 659304
>>> iteratorAddAssign: 1.2279 = 2247197 / 1830107
>>> iteratorArithmetic: 0.81605 = 14498419 / 17766577
>>> micros: 0.699386 = 554426 / 792732
>>>
>>> Distribution: increasing_unique(100000)
>>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>>
>>> move: 1 = 91808 / 91808
>>> moveAssign: 1 = 91808 / 91808
>>> compare: 0.573591 = 224558 / 391495
>>> iteratorDeref: 0.722385 = 540924 / 748803
>>> iteratorCopy: 0.298175 = 976735 / 3275708
>>> iteratorEquals: 0.522302 = 116383 / 222827
>>> iteratorInc: 0.715444 = 132750 / 185549
>>> iteratorDec: 1 = 91808 / 91808
>>> iteratorDiff: 0.217455 = 40956 / 188342
>>> iteratorAdd: 1 = 16383 / 16383
>>> iteratorAddAssign: 0.299523 = 81884 / 273381
>>> iteratorArithmetic: 0.49082 = 480164 / 978290
>>> micros: 0.479792 = 22935 / 47802
>>>
>>> Distribution: increasing_dupes(100000)
>>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]
>>>
>>> move: 1 = 91808 / 91808
>>> moveAssign: 1 = 91808 / 91808
>>> compare: 0.573591 = 224558 / 391495
>>> iteratorDeref: 0.722385 = 540924 / 748803
>>> iteratorCopy: 0.298175 = 976735 / 3275708
>>> iteratorEquals: 0.522302 = 116383 / 222827
>>> iteratorInc: 0.715444 = 132750 / 185549
>>> iteratorDec: 1 = 91808 / 91808
>>> iteratorDiff: 0.217455 = 40956 / 188342
>>> iteratorAdd: 1 = 16383 / 16383
>>> iteratorAddAssign: 0.299523 = 81884 / 273381
>>> iteratorArithmetic: 0.49082 = 480164 / 978290
>>> micros: 0.449154 = 20675 / 46031
>>>
>>> Distribution: mostly_increasing(100000)
>>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20]
>>>
>>> move: 0.681093 = 814725 / 1196203
>>> moveAssign: 0.703421 = 1809563 / 2572519
>>> compare: 0.924016 = 965320 / 1044700
>>> iteratorDeref: 0.847765 = 3802634 / 4485480
>>> iteratorCopy: 0.804068 = 11859111 / 14748894
>>> iteratorEquals: 1.59826 = 1388752 / 868913
>>> iteratorInc: 1.25659 = 1818421 / 1447111
>>> iteratorDec: 0.383738 = 537108 / 1399674
>>> iteratorDiff: 0.486488 = 562273 / 1155779
>>> iteratorSub: 0.0638659 = 6666 / 104375
>>> iteratorAdd: 0.184194 = 55586 / 301780
>>> iteratorAddAssign: 1.30369 = 1105439 / 847932
>>> iteratorArithmetic: 0.893672 = 5474245 / 6125564
>>> micros: 0.790603 = 233953 / 295917
>>>
>>> Distribution: decreasing_unique(100000)
>>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>>
>>> move: 0.40597 = 995312 / 2451688
>>> moveAssign: 0.45785 = 2459840 / 5372592
>>> compare: 0.489247 = 151631 / 309927
>>> iteratorDeref: 0.524798 = 3415934 / 6509043
>>> iteratorCopy: 0.563628 = 5971275 / 10594360
>>> iteratorEquals: 1.18395 = 843533 / 712475
>>> iteratorInc: 0.864454 = 1460463 / 1689462
>>> iteratorDec: 0.373125 = 1606976 / 4306800
>>> iteratorDiff: 0.495226 = 235930 / 476409
>>> iteratorSub: 0.248254 = 13824 / 55685
>>> iteratorAdd: 0.590877 = 135839 / 229894
>>> iteratorAddAssign: 0.421852 = 119646 / 283621
>>> iteratorArithmetic: 0.569514 = 4416211 / 7754346
>>> micros: 0.543145 = 167865 / 309061
>>>
>>> Distribution: decreasing_dupes(100000)
>>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0]
>>>
>>> move: 0.741695 = 1455276 / 1962096
>>> moveAssign: 0.736048 = 2826598 / 3840238
>>> compare: 0.641679 = 282851 / 440798
>>> iteratorDeref: 0.740565 = 3386062 / 4572271
>>> iteratorCopy: 0.628798 = 5008472 / 7965150
>>> iteratorEquals: 1.50206 = 794354 / 528843
>>> iteratorInc: 1.39602 = 2164546 / 1550510
>>> iteratorDec: 0.336639 = 837728 / 2488504
>>> iteratorDiff: 0.313457 = 88552 / 282501
>>> iteratorSub: 0.165388 = 3880 / 23460
>>> iteratorAdd: 0.377644 = 27139 / 71864
>>> iteratorAddAssign: 0.551364 = 182383 / 330785
>>> iteratorArithmetic: 0.776766 = 4098582 / 5276467
>>> micros: 0.71282 = 160398 / 225019
>>>
>>> Distribution: mostly_decreasing(100000)
>>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7,
>>> 6, 4, 3, 2]
>>>
>>> move: 0.760602 = 2197435 / 2889075
>>> moveAssign: 0.768473 = 4591313 / 5974593
>>> compare: 0.932913 = 977678 / 1047984
>>> iteratorDeref: 0.838641 = 6633169 / 7909424
>>> iteratorCopy: 0.829028 = 15447960 / 18633830
>>> iteratorEquals: 1.7009 = 2000686 / 1176251
>>> iteratorInc: 1.52771 = 3788203 / 2479657
>>> iteratorDec: 0.363843 = 1369990 / 3765332
>>> iteratorDiff: 0.489186 = 595472 / 1217270
>>> iteratorSub: 0.0877827 = 10925 / 124455
>>> iteratorAdd: 0.201991 = 69640 / 344767
>>> iteratorAddAssign: 1.32756 = 1133512 / 853830
>>> iteratorArithmetic: 0.900303 = 8968428 / 9961562
>>> micros: 0.846298 = 377504 / 446065
>>>
>>> Distribution: sawtooth_unique(100000)
>>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17]
>>>
>>> move: 0.569953 = 2912465 / 5110007
>>> moveAssign: 0.577701 = 6012427 / 10407511
>>> compare: 0.824488 = 1604387 / 1945920
>>> iteratorDeref: 0.678979 = 9299794 / 13696724
>>> iteratorCopy: 0.72546 = 26157352 / 36056250
>>> iteratorEquals: 2.00683 = 4131766 / 2058856
>>> iteratorInc: 1.10728 = 6498257 / 5868688
>>> iteratorDec: 0.111225 = 552519 / 4967557
>>> iteratorDiff: 0.405328 = 988144 / 2437889
>>> iteratorSub: 0.026249 = 6327 / 241038
>>> iteratorAdd: 0.089034 = 55918 / 628052
>>> iteratorAddAssign: 1.18737 = 2297233 / 1934717
>>> iteratorArithmetic: 0.801143 = 14530164 / 18136797
>>> micros: 0.706282 = 583937 / 826776
>>>
>>> Distribution: sawtooth_dupes(100000)
>>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
>>>
>>> move: 0.547721 = 2348873 / 4288447
>>> moveAssign: 0.543252 = 4613817 / 8492965
>>> compare: 0.664642 = 841503 / 1266099
>>> iteratorDeref: 0.599656 = 6292282 / 10493159
>>> iteratorCopy: 0.60663 = 15336816 / 25281976
>>> iteratorEquals: 2.24803 = 2859760 / 1272118
>>> iteratorInc: 1.07142 = 4999546 / 4666276
>>> iteratorDec: 0.0490551 = 205953 / 4198405
>>> iteratorDiff: 0.314518 = 473157 / 1504390
>>> iteratorSub: 0.0371951 = 5998 / 161258
>>> iteratorAdd: 0.0875684 = 34672 / 395942
>>> iteratorAddAssign: 0.900009 = 1233077 / 1370072
>>> iteratorArithmetic: 0.72316 = 9812163 / 13568461
>>> micros: 0.610891 = 353102 / 578011
>>>
>>> Distribution: increasing_with_swap(100000)
>>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0]
>>>
>>> move: 1 = 291782 / 291782
>>> moveAssign: 1 = 491779 / 491779
>>> compare: 0.57449 = 225064 / 391763
>>> iteratorDeref: 0.819553 = 941922 / 1149312
>>> iteratorCopy: 0.377964 = 1392386 / 3683915
>>> iteratorEquals: 0.525758 = 117408 / 223312
>>> iteratorInc: 0.863708 = 333050 / 385605
>>> iteratorDec: 0.999753 = 291716 / 291788
>>> iteratorDiff: 0.224408 = 42528 / 189512
>>> iteratorSub: 0.888889 = 104 / 117
>>> iteratorAdd: 0.997151 = 16801 / 16849
>>> iteratorAddAssign: 0.302604 = 82864 / 273836
>>> iteratorArithmetic: 0.640448 = 884471 / 1381019
>>> micros: 0.616124 = 36233 / 58808
>>>
>>> Distribution: increasing_fast_slow(100000)
>>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18]
>>>
>>> move: 0.640348 = 825772 / 1289568
>>> moveAssign: 0.627081 = 1559792 / 2487384
>>> compare: 0.714235 = 431286 / 603843
>>> iteratorDeref: 0.694913 = 2421875 / 3485151
>>> iteratorCopy: 0.637176 = 5645877 / 8860786
>>> iteratorEquals: 1.7552 = 953880 / 543458
>>> iteratorInc: 1.08941 = 1676609 / 1539005
>>> iteratorDec: 0.135904 = 154533 / 1137073
>>> iteratorDiff: 0.388132 = 192787 / 496705
>>> iteratorSub: 0.00577964 = 141 / 24396
>>> iteratorAdd: 0.207287 = 17272 / 83324
>>> iteratorAddAssign: 0.849015 = 431320 / 508024
>>> iteratorArithmetic: 0.790987 = 3426542 / 4331985
>>> micros: 0.777699 = 153552 / 197444
>>>
>>>
>>>
>>> #include <iostream>
>>> #include <stdio.h>
>>> #include <vector>
>>> #include <cstdlib>
>>> #include <algorithm>
>>> #include <functional>
>>> #include <ranges>
>>> #include <chrono>
>>> #include <cmath>
>>> #include <numeric>
>>>
>>> using namespace std;
>>>
>>> #define LINE0 0
>>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ")
>>> #define vout(v) #v << " = " << (v)
>>> #define print1(v) cout_line << vout(v) << endl
>>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl
>>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2)
>>> << ", " << vout(v3) << endl
>>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME
>>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__)
>>>
>>> template <class Ch, class Tr, class Container>
>>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os,
>>> Container const& x);
>>>
>>> template <class X, class Y>
>>> ostream & operator << (ostream & os, pair <X, Y> const& p) {
>>> return os << p.first << ":" << p.second;
>>> }
>>>
>>> template <class Ch, class Tr, class Container>
>>> basic_ostream <Ch, Tr> & operator << (basic_ostream <Ch, Tr> & os,
>>> Container const& x) {
>>>       os << "[";
>>>       bool first = true;
>>>       for(auto const& y : x) {
>>>           if (!first) os << ", ";
>>>           os << y;
>>>           first = false;
>>>       }
>>>       return os << "]";
>>> }
>>>
>>>
>>> struct Counter {
>>>     int construct = 0;
>>>     int copy = 0;
>>>     int assign = 0;
>>>     int move = 0;
>>>     int moveAssign = 0;
>>>     int compare = 0;
>>>     int iteratorDeref = 0;
>>>     int iteratorCopy = 0;
>>>     int iteratorEquals = 0;
>>>     int iteratorInc = 0;
>>>     int iteratorDec = 0;
>>>     int iteratorDiff = 0;
>>>     int iteratorSub = 0;
>>>     int iteratorAdd = 0;
>>>     int iteratorAddAssign = 0;
>>>     int iteratorArithmetic = 0;
>>>     int micros = 0;
>>> };
>>>
>>> int next_element_id = 0;
>>>
>>> class Element {
>>>     Counter* counter;
>>> public:
>>>     int val;
>>>     int id;  // Distinguis identical values to verify stability.
>>>     Element(int val, Counter* counter)
>>>         : val(val), counter(counter), id(++next_element_id) {
>>>       ++counter->construct;
>>>     }
>>>
>>>     Element(Element const& other)
>>>         : val(other.val), counter(other.counter), id(other.id) {
>>>       ++counter->copy;
>>>     }
>>>
>>>     Element& operator=(Element const& other) {
>>>       val = other.val;
>>>       counter = other.counter;
>>>       id = other.id;
>>>       ++counter->assign;
>>>       return *this;
>>>     }
>>>
>>>     Element(Element&& other) noexcept
>>>         : val(move(other.val)), counter(other.counter), id(other.id) {
>>>       ++counter->move;
>>>     }
>>>
>>>     Element& operator=(Element&& other) noexcept {
>>>       val = move(other.val);
>>>       counter = other.counter;
>>>       id = other.id;
>>>       ++counter->moveAssign;
>>>       return *this;
>>>     }
>>>
>>>     bool operator<(Element const& other) const {
>>>       ++counter->compare;
>>>       return val < other.val;
>>>     }
>>>
>>>     bool operator==(Element const& other) const {
>>>       return val == other.val && id == other.id;
>>>     }
>>> };
>>>
>>> ostream & operator << (ostream & os, Element const& e) {
>>> return os << e.val << "(" << e.id << ")";
>>> }
>>>
>>>
>>> class RandomIterator {
>>>     typedef vector<Element>::iterator It;
>>>     It it;
>>>     Counter* counter;
>>>
>>> public:
>>>     using iterator_category = std::random_access_iterator_tag;
>>>     using difference_type   = std::ptrdiff_t;
>>>     using value_type        = Element;
>>>     using pointer           = value_type*;
>>>     using reference         = value_type&;
>>>
>>>     RandomIterator(It it, Counter* counter) : it(it), counter(counter) {
>>>     }
>>>
>>>     RandomIterator(RandomIterator const& other) : it(other.it),
>>> counter(other.counter) {
>>>       counter->iteratorCopy += 1;
>>>     }
>>>
>>>     bool operator== (RandomIterator other) {
>>>       counter->iteratorEquals += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return it == other.it;
>>>     }
>>>
>>>     RandomIterator operator++ () {
>>>       counter->iteratorInc += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return RandomIterator(++it, counter);
>>>     }
>>>
>>>     RandomIterator operator-- () {
>>>       counter->iteratorDec += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return RandomIterator(--it, counter);
>>>     }
>>>
>>>     Element& operator* () {
>>>       counter->iteratorDeref += 1;
>>>       return *it;
>>>     }
>>>
>>>     difference_type operator- (RandomIterator other) {
>>>       counter->iteratorDiff += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return it - other.it;
>>>     }
>>>
>>>     RandomIterator operator- (difference_type diff) {
>>>       counter->iteratorSub += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return RandomIterator(it - diff, counter);
>>>     }
>>>
>>>     RandomIterator operator+ (difference_type diff) {
>>>       counter->iteratorAdd += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return RandomIterator(it + diff, counter);
>>>     }
>>>
>>>     RandomIterator operator+= (difference_type diff) {
>>>       counter->iteratorAddAssign += 1;
>>>       counter->iteratorArithmetic += 1;
>>>       return RandomIterator(it += diff, counter);
>>>     }
>>> };
>>>
>>>
>>> class TestCase {
>>>     Counter counter;
>>>     vector<Element> elements;
>>>
>>> public:
>>>     TestCase(vector<int> const& data) {
>>>       elements.reserve(data.size());
>>>       for (int i: data) {
>>>         elements.push_back(Element(i, &counter));
>>>       }
>>>     }
>>>
>>>     Counter sort(
>>>         void algo(RandomIterator, RandomIterator),
>>>         vector<Element>::iterator begin,
>>>         vector<Element>::iterator end) {
>>>       counter = Counter();
>>>       auto t0 = std::chrono::system_clock::now();
>>>
>>>       algo(RandomIterator(begin, &counter), RandomIterator(end, &counter));
>>>
>>>       auto t1 = std::chrono::system_clock::now();
>>>       counter.micros =
>>> std::chrono::duration_cast<std::chrono::microseconds>(t1 -
>>> t0).count();
>>>       return counter;
>>>     }
>>>
>>>     pair<Counter,Counter> benchmark(
>>>         void new_algo(RandomIterator, RandomIterator),
>>>         void cur_algo(RandomIterator, RandomIterator)) {
>>>       vector<Element> toSort1 = elements;
>>>       Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end());
>>>
>>>       vector<Element> toSort2 = elements;
>>>       Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end());
>>>
>>>       if (toSort1 != toSort2) {
>>>         cout << "INCORRECT RESULT!" << endl;
>>>         exit(1);
>>>       }
>>>
>>>       return make_pair(counter1, counter2);
>>>     }
>>> };
>>>
>>>
>>> void relative_metric(string name, int m1, int m2) {
>>>     if (m1 == 0 && m2 == 0) return;
>>>     if (m2 == 0) {
>>>       cout << name << ": inf = " << m1 << " / " << m2 << endl;
>>>     } else {
>>>       cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / "
>>> << m2 << endl;
>>>     }
>>> }
>>>
>>>
>>> void relative_report(Counter const& c1, Counter const& c2) {
>>>     cout << endl;
>>>     relative_metric("construct", c1.construct, c2.construct);
>>>     relative_metric("copy", c1.copy, c2.copy);
>>>     relative_metric("assign", c1.assign, c2.assign);
>>>     relative_metric("move", c1.move, c2.move);
>>>     relative_metric("moveAssign", c1.moveAssign, c2.moveAssign);
>>>     relative_metric("compare", c1.compare, c2.compare);
>>>     relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref);
>>>     relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy);
>>>     relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals);
>>>     relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc);
>>>     relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec);
>>>     relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff);
>>>     relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub);
>>>     relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd);
>>>     relative_metric("iteratorAddAssign", c1.iteratorAddAssign,
>>> c2.iteratorAddAssign);
>>>     relative_metric("iteratorArithmetic", c1.iteratorArithmetic,
>>> c2.iteratorArithmetic);
>>>     relative_metric("micros", c1.micros, c2.micros);
>>>     cout << endl;
>>> }
>>>
>>>
>>> /// This is a helper function for the merge routines.
>>> template<typename _BidirectionalIterator, typename _Distance, typename _Compare>
>>> void _my_merge_without_buffer(
>>> _BidirectionalIterator __first,
>>> _BidirectionalIterator __middle,
>>> _BidirectionalIterator __last,
>>> _Distance __len1, _Distance __len2,
>>> _Compare __comp) {
>>>
>>> if (__len1 == 0 || __len2 == 0)
>>> return;
>>>
>>> if (__len1 + __len2 == 2) {
>>>     if (__comp(__middle, __first))
>>>       std::iter_swap(__first, __middle);
>>>     return;
>>> }
>>>
>>> _BidirectionalIterator __first_cut = __first;
>>> _BidirectionalIterator __second_cut = __middle;
>>> _Distance __len11 = 0;
>>> _Distance __len22 = 0;
>>>
>>>
>>> #define NEW_ALGO 1
>>>
>>> #ifdef NEW_ALGO
>>>     // Prevent stack overflow
>>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) {
>>> #endif
>>>       if (__len1 > __len2)
>>>         {
>>>           __len11 = __len1 / 2;
>>>           std::advance(__first_cut, __len11);
>>>           __second_cut
>>>             = std::__lower_bound(__middle, __last, *__first_cut,
>>>                __gnu_cxx::__ops::__iter_comp_val(__comp));
>>>           __len22 = std::distance(__middle, __second_cut);
>>>         }
>>>       else
>>>         {
>>>           __len22 = __len2 / 2;
>>>           std::advance(__second_cut, __len22);
>>>           __first_cut
>>>             = std::__upper_bound(__first, __middle, *__second_cut,
>>>                __gnu_cxx::__ops::__val_comp_iter(__comp));
>>>           __len11 = std::distance(__first, __first_cut);
>>>         }
>>> #ifdef NEW_ALGO
>>>     } else
>>>     {
>>>       _Distance __cut_range;
>>>       // __cut_len is always equal to __middle - __first_cut (and eventually to
>>>       // __second_cut - __middle) but we keep track of it to avoid calculating it
>>>       // later which can be expensive for bidirectional iterators.
>>>       _Distance __cut_len;
>>>
>>> if (__len1 > __len2)
>>> {
>>> std::advance(__first_cut, __len1 - __len2);
>>> __cut_range = __len2;
>>> }
>>> else
>>> {
>>> __cut_range = __len1;
>>> }
>>>       __cut_len = __cut_range;
>>>
>>> while (__cut_range > 0) {
>>> _Distance __half_cut_range = __cut_range / 2;
>>> _BidirectionalIterator __mid_first_cut = __first_cut;
>>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1);
>>> _BidirectionalIterator __mid_second_cut = __second_cut;
>>> std::advance(__mid_second_cut, __half_cut_range);
>>>
>>> if (__comp(__mid_second_cut, __mid_first_cut)) {
>>> __cut_range -= __half_cut_range + 1;
>>> __second_cut = __mid_second_cut;
>>> ++__second_cut;
>>> } else {
>>>           __cut_len -= __cut_range - __half_cut_range;
>>> __cut_range = __half_cut_range;
>>> __first_cut = __mid_first_cut;
>>> ++__first_cut;
>>> }
>>> }
>>> __len22 = __cut_len;
>>>       __len11 = __len1 - __cut_len;
>>> }
>>> #endif
>>>
>>>     _BidirectionalIterator __new_middle
>>>         = std::rotate(__first_cut, __middle, __second_cut);
>>>     _my_merge_without_buffer(__first, __first_cut, __new_middle,
>>>         __len11, __len22, __comp);
>>>     _my_merge_without_buffer(__new_middle, __second_cut, __last,
>>>         __len1 - __len11, __len2 - __len22, __comp);
>>> }
>>>
>>> /// This is a helper function for the stable sorting routines.
>>> template<typename _RandomAccessIterator, typename _Compare>
>>>     void
>>>     _my_inplace_stable_sort(_RandomAccessIterator __first,
>>>         _RandomAccessIterator __last, _Compare __comp)
>>>     {
>>>
>>>       if (__last - __first < 15)
>>>         {
>>>           std::__insertion_sort(__first, __last, __comp);
>>>           return;
>>>         }
>>>       _RandomAccessIterator __middle = __first + (__last - __first) / 2;
>>>       _my_inplace_stable_sort(__first, __middle, __comp);
>>>       _my_inplace_stable_sort(__middle, __last, __comp);
>>>       _my_merge_without_buffer(__first, __middle, __last,
>>>           __middle - __first,
>>>           __last - __middle,
>>>           __comp);
>>>     }
>>>
>>>
>>> template<typename Iterator>
>>> void my_inplace_stable_sort(Iterator start, Iterator end) {
>>>     _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter());
>>> }
>>>
>>> template<typename Iterator>
>>> void std_inplace_stable_sort(Iterator start, Iterator end) {
>>>     std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter());
>>> }
>>>
>>> void benchmark(string const& dist_name,
>>>       vector<int> dist_func(int),
>>>       int n) {
>>>     vector<int> v = dist_func(n);
>>>     cout << "Distribution: " << dist_name << "(" << n << ")" << endl;
>>>     cout << "Example: " << dist_func(20) << endl;
>>>     TestCase testCase(v);
>>>
>>>     pair<Counter, Counter> counter_pair =
>>>         testCase.benchmark(
>>>             my_inplace_stable_sort
>>>             , std_inplace_stable_sort
>>>             );
>>>     Counter cKloet = counter_pair.first;
>>>     Counter cInplace = counter_pair.second;
>>>
>>>     relative_report(cKloet, cInplace);
>>> }
>>>
>>>
>>> vector<int> random_unique(int n) {
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i;
>>>     }
>>>     for (int i = n - 1; i >= 0; --i) {
>>>       swap(v[i], v[rand() % (i + 1)]);
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> random_dupes(int n) {
>>>     vector<int> v(n);
>>>     int r = (int)sqrt(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i / r;
>>>     }
>>>     for (int i = n - 1; i >= 0; --i) {
>>>       swap(v[i], v[rand() % (i + 1)]);
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> increasing_unique(int n) {
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i;
>>>     }
>>>     return v;
>>> }
>>>
>>>
>>> vector<int> increasing_dupes(int n) {
>>>     vector<int> v(n);
>>>     int r = (int)sqrt(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i / r;
>>>     }
>>>     return v;
>>> }
>>>
>>>
>>> vector<int> mostly_increasing(int n) {
>>>     vector<int> v(n);
>>>     int r = (int)sqrt(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i + rand() % r;
>>>     }
>>>     return v;
>>> }
>>>
>>>
>>> vector<int> decreasing_unique(int n) {
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = n - 1 - i;
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> decreasing_dupes(int n) {
>>>     vector<int> v(n);
>>>     int r = (int)sqrt(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = (n - 1 - i) / r;
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> mostly_decreasing(int n) {
>>>     vector<int> v(n);
>>>     int r = (int)sqrt(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = n - 1 - i + rand() % r;
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> sawtooth_unique(int n) {
>>>     int r = (int)sqrt(n);
>>>     while (gcd(r, n) != 1) ++r;
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[(i * r) % n] = i;
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> sawtooth_dupes(int n) {
>>>     int r = (int)sqrt(n);
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i % r;
>>>     }
>>>     return v;
>>> }
>>>
>>> vector<int> increasing_with_swap(int n) {
>>>     vector<int> v(n);
>>>     for (int i = 0; i < n; ++i) {
>>>       v[i] = i;
>>>     }
>>>     swap(v[0], v[n - 1]);
>>>     return v;
>>> }
>>>
>>> vector<int> increasing_fast_slow(int n) {
>>>     int r = (int)sqrt(n);
>>>     vector<int> v(n);
>>>     for (int i = 0; i < r; ++i) {
>>>       v[i] = i * r;
>>>     }
>>>     for (int i = r; i < n; ++i) {
>>>       v[i] = (i - r) * n / (n - r);
>>>     }
>>>     return v;
>>> }
>>>
>>>
>>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n)
>>>
>>> int main() {
>>>     int n = 100000;
>>>     BENCHMARK(random_unique, n);
>>>     BENCHMARK(random_dupes, n);
>>>     BENCHMARK(increasing_unique, n);
>>>     BENCHMARK(increasing_dupes, n);
>>>     BENCHMARK(mostly_increasing, n);
>>>     BENCHMARK(decreasing_unique, n);
>>>     BENCHMARK(decreasing_dupes, n);
>>>     BENCHMARK(mostly_decreasing, n);
>>>     BENCHMARK(sawtooth_unique, n);
>>>     BENCHMARK(sawtooth_dupes, n);
>>>     BENCHMARK(increasing_with_swap, n);
>>>     BENCHMARK(increasing_fast_slow, n);
>>>
>>>     return 0;
>>> }
>>


  reply	other threads:[~2021-11-25  5:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-18 12:58 David de Kloet
2021-11-24 18:33 ` François Dumont
2021-11-24 18:48   ` David de Kloet
2021-11-25  5:35     ` François Dumont [this message]
2021-11-25 19:51       ` David de Kloet
2021-12-10 15:31         ` David de Kloet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com \
    --to=frs.dumont@gmail.com \
    --cc=dskloet@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).