From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 8F8F6385780A for ; Fri, 10 Dec 2021 15:31:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F8F6385780A Received: by mail-oi1-x236.google.com with SMTP id t23so13726225oiw.3 for ; Fri, 10 Dec 2021 07:31:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MvMdytJc5UnUZiyYpG3uq8W1qzdqt9T3qsC6R67TuII=; b=hOYLLgk/5Xa42RWeLUS5o8qeAT/XPFPbnCow1bLy/usRVTXWnR2a82Cm9s/TIS42mO 9xGEH9F50aNq8MQ8XHfNexm3psqR1GbBdsRRl7eXf/ptPMhkkwa2EHmC/9avxczAlhbH 7uoE46YYa7rN5QbBt5F8orGKHmPrh+8L0H4oW9wv7WLrID9oxgUz/vFImHOGJpvuDHtl Y16Cj0KWS4CKSElzqqR9g2ZacoAhcKGPZ3FO2m7bavTRfgJolSHqpE6gC1bXZe4ePN6m OVcPFmgyaLyxjQ+TlUEH3hYC50+/PDDzAJ+43UAQ/zU/XGbVfH3hlshTtm3Pl9rUlBLU ogog== X-Gm-Message-State: AOAM533d8RIOCcTZWIanfsLyYMqKknMmHRy1TbOlmNGD2ZY2d7KgzS1L 7XwBT/YEYWV+m6vk3ConEGVhRSSsAnuy65mN2x5m2kd/BU4= X-Google-Smtp-Source: ABdhPJyTG7AWN6Pk+XVIDe8ZtHMb96Hmzpvty6uwWq83LdbxZN3pVXqysxhDa4NTAT5rP6LvfTikSXsI7QhKEJASSI4= X-Received: by 2002:aca:1a04:: with SMTP id a4mr12533266oia.153.1639150271671; Fri, 10 Dec 2021 07:31:11 -0800 (PST) MIME-Version: 1.0 References: <9f7342af-398c-fa0f-4182-761644d6e5f3@gmail.com> <54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com> In-Reply-To: From: David de Kloet Date: Fri, 10 Dec 2021 16:31:00 +0100 Message-ID: Subject: Re: __inplace_stable_sort improvement To: =?UTF-8?Q?Fran=C3=A7ois_Dumont?= Cc: libstdc++@gcc.gnu.org X-Spam-Status: No, score=1.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Dec 2021 15:31:23 -0000 I had another look at this and decided to try `make check-performance` as you suggested, but I got make: *** No rule to make target 'check-performance'. Stop. Then I tried to run `configure` (from another directory) and got Building GCC requires GMP 4.2+, MPFR 3.1.0+ and MPC 0.8.0+. ... Since I have no idea what I'm doing, I thought I'd better check if I'm missing something obvious before trying to figure it out. On Thu, Nov 25, 2021 at 8:51 PM David de Kloet wrote: > On Thu, Nov 25, 2021 at 6:35 AM Fran=C3=A7ois Dumont > wrote: > > > > On 24/11/21 7:48 pm, David de Kloet wrote: > > > On Wed, Nov 24, 2021 at 7:33 PM Fran=C3=A7ois Dumont > wrote: > > >> > > >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > > >>> Hello libstdc++, > > >>> > > >>> Apologies if this is the wrong list for my message. I believe I hav= e > an > > >>> improvement for __inplace_stable_sort and I'm wondering if this is > worth > > >>> pursuing and if so how to proceed. > > >> Of course, anyone interested in enhancing the library is welcome. > > >> > > >> The best way to do so would be to provide a patch on the current > library > > >> code. I'm the last one to have touch to this algo and it would be > easier > > >> for me to understand if you try to implement it directly on the > library. > > > _my_merge_without_buffer below preprocesses exactly to > > > __merge_without_buffer from the library if you undefine NEW_ALGO so > > > you can see exactly what my changes are. > > > > > > I currently don't have much time, but when I do I'd be happy to > > > provide it as a patch. If you find time before me, you're also more > > > than welcome to use the code. > > > > > >> The good news is that you are trying to enhance a piece of code rare= ly > > >> used as it is only when the buffer allocation fails. So it is less > > >> critical and so perhaps more open to modifications. > > >> > > >> During my last work on it I've added a performance test: > > >> performance/25_algorithms/inplace_merge.cc. It is also supposed to > check > > >> performance when we fail to get a buffer and so use > > >> __merge_without_buffer. Use 'make check-performance' to run it, see > > >> library manual guide. Do not hesitate to change it if you think that > > >> your benches are better. > > > I don't currently have the environment to `make check-performance`. I > > > just copied the code from the web and adjusted it. But (again) when I > > > have time, I'll look into that. > > > > > > If you know of any "contributing to stdlibc++" guides, that would be > great too. > > > > > https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.htm= l > > > > > > > > > >>> __inplace_stable_sort does a merge sort where in each recursive ste= p > the 2 > > >>> sublists are merged in-place. The merge step is itself also > recursive. In order > > >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head > section S2 > > >>> of [B, C) are found such that all elements in S1 are greater than > all elements > > >>> in S2 and then those sections are swapped/rotated and the shorter > sections are > > >>> merged recursively. This is done by the function > __merge_without_buffer. > > >>> > > >>> In the current implementation, __merge_without_buffer takes the hal= f > point of > > >>> the longer section and does a binary search (via lower_bound or > upper_bound) to > > >>> find the corresponding cut point on the other section. Then it > rotates the > > >>> sections. > > >>> > > >>> My proposal is to find the cut points such that both rotated > sections have the > > >>> same length such that they can always simply be swapped. > > >> What do you mean by swapped ? > > > I mean the first element of section 1 gets swapped with the first > > > element of section 2. The second element of section 1 gets swapped > > > with the second element of section 2. Etc. for each index. > > > > > >> I see in the code below that you are just > > >> changing how the cut positions are computed and then call the same > > >> std::rotate algo. I cannot see any reason why changing the position > > >> would make the rotate faster. > > > The implementation of rotate does a series of such section swaps. But > > > if both section have the same length, it's only a single swap of > > > sections which makes rotate about 1.5-2 times faster. > > > > Ok, I see now, you try to use as much as possible the std::swap_ranges > > in __rotate for random access iterator. Maybe it means that this > > optimization should be limited to this type of iterator. We'll see > > depending on benches. > > The binary search implementation is a bit awkward for the > bidi-iterator, so having a separate implementation for random access > iterators might improve the readability. However, if you look at my > benchmarks, it also does fewer comparisons than the current algorithm. > But rotate() doesn't do any comparisons. So there must be another > benefit to choosing the cut points this way, but I'm not entirely sure > why. > > > > > I'll have a try then when I have time. > > > > Thanks > > > > > > > > The purpose of the code I added is to choose the cut positions such > > > that the rotated sections are of equal length, which makes rotate > > > faster by doing fewer element swaps, resulting in fewer calls to the > > > copy/assignment constructor. > > > > > >>> I have implemented > > >>> this and it seems to perform better in general and in the special > cases that > > >>> I've tried. > > >>> > > >>> Unfortunately, since we are no longer guaranteed to halve the longe= r > section in > > >>> each step, there is a possibility for a stack overflow. To avoid > this, the new > > >>> algorithm falls back to the old behavior whenever the sections have > > >>> significantly different lengths. > > >>> > > >>> Also, where the current algorithm performs the binary search > implicitly via > > >>> lower_bound/upper_bound, the new algorithm has to implement the > binary search > > >>> explicitly because it has to look at both sections to determine the > next step > > >>> in the search. > > >>> > > >>> I have created wrappers for my value objects as well as my iterator= s > to count > > >>> the number of operations performed by the current and my algorithm > and run > > >>> benchmarks on various distributions. My code is at the bottom but > below is the > > >>> output as run on my machine. For each operation it outputs the rati= o > between > > >>> the number of operations performed by my algorithm over the number > performed by > > >>> the current algorithm. > > >>> > > >>> Thanks for your attention, > > >>> David > > >>> > > >>> > > >>> Distribution: random_unique(100000) > > >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, > 8, 12, 3, 2] > > >>> > > >>> move: 0.588026 =3D 3331848 / 5666154 > > >>> moveAssign: 0.594789 =3D 6852819 / 11521431 > > >>> compare: 0.914997 =3D 2028977 / 2217469 > > >>> iteratorDeref: 0.718466 =3D 11001863 / 15312994 > > >>> iteratorCopy: 0.791332 =3D 33977189 / 42936712 > > >>> iteratorEquals: 2.27841 =3D 5174606 / 2271143 > > >>> iteratorInc: 1.2143 =3D 7697971 / 6339449 > > >>> iteratorDec: 0.0833529 =3D 473556 / 5681340 > > >>> iteratorDiff: 0.444865 =3D 1362598 / 3062950 > > >>> iteratorSub: 0.0200683 =3D 6272 / 312533 > > >>> iteratorAdd: 0.0695467 =3D 57149 / 821736 > > >>> iteratorAddAssign: 1.36491 =3D 3141949 / 2301950 > > >>> iteratorArithmetic: 0.861623 =3D 17914101 / 20791101 > > >>> micros: 0.663704 =3D 650825 / 980596 > > >>> > > >>> Distribution: random_dupes(100000) > > >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, = 1] > > >>> > > >>> move: 0.570041 =3D 2949623 / 5174401 > > >>> moveAssign: 0.577736 =3D 6087819 / 10537375 > > >>> compare: 0.845419 =3D 1530877 / 1810790 > > >>> iteratorDeref: 0.678233 =3D 9232869 / 13613114 > > >>> iteratorCopy: 0.742699 =3D 25652244 / 34539225 > > >>> iteratorEquals: 2.41643 =3D 4123881 / 1706601 > > >>> iteratorInc: 1.15617 =3D 6554374 / 5669028 > > >>> iteratorDec: 0.101004 =3D 539991 / 5346219 > > >>> iteratorDiff: 0.417093 =3D 959972 / 2301580 > > >>> iteratorSub: 0.0350046 =3D 8882 / 253738 > > >>> iteratorAdd: 0.0972571 =3D 64122 / 659304 > > >>> iteratorAddAssign: 1.2279 =3D 2247197 / 1830107 > > >>> iteratorArithmetic: 0.81605 =3D 14498419 / 17766577 > > >>> micros: 0.699386 =3D 554426 / 792732 > > >>> > > >>> Distribution: increasing_unique(100000) > > >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > 17, 18, 19] > > >>> > > >>> move: 1 =3D 91808 / 91808 > > >>> moveAssign: 1 =3D 91808 / 91808 > > >>> compare: 0.573591 =3D 224558 / 391495 > > >>> iteratorDeref: 0.722385 =3D 540924 / 748803 > > >>> iteratorCopy: 0.298175 =3D 976735 / 3275708 > > >>> iteratorEquals: 0.522302 =3D 116383 / 222827 > > >>> iteratorInc: 0.715444 =3D 132750 / 185549 > > >>> iteratorDec: 1 =3D 91808 / 91808 > > >>> iteratorDiff: 0.217455 =3D 40956 / 188342 > > >>> iteratorAdd: 1 =3D 16383 / 16383 > > >>> iteratorAddAssign: 0.299523 =3D 81884 / 273381 > > >>> iteratorArithmetic: 0.49082 =3D 480164 / 978290 > > >>> micros: 0.479792 =3D 22935 / 47802 > > >>> > > >>> Distribution: increasing_dupes(100000) > > >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, = 4] > > >>> > > >>> move: 1 =3D 91808 / 91808 > > >>> moveAssign: 1 =3D 91808 / 91808 > > >>> compare: 0.573591 =3D 224558 / 391495 > > >>> iteratorDeref: 0.722385 =3D 540924 / 748803 > > >>> iteratorCopy: 0.298175 =3D 976735 / 3275708 > > >>> iteratorEquals: 0.522302 =3D 116383 / 222827 > > >>> iteratorInc: 0.715444 =3D 132750 / 185549 > > >>> iteratorDec: 1 =3D 91808 / 91808 > > >>> iteratorDiff: 0.217455 =3D 40956 / 188342 > > >>> iteratorAdd: 1 =3D 16383 / 16383 > > >>> iteratorAddAssign: 0.299523 =3D 81884 / 273381 > > >>> iteratorArithmetic: 0.49082 =3D 480164 / 978290 > > >>> micros: 0.449154 =3D 20675 / 46031 > > >>> > > >>> Distribution: mostly_increasing(100000) > > >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, > 18, 21, 20] > > >>> > > >>> move: 0.681093 =3D 814725 / 1196203 > > >>> moveAssign: 0.703421 =3D 1809563 / 2572519 > > >>> compare: 0.924016 =3D 965320 / 1044700 > > >>> iteratorDeref: 0.847765 =3D 3802634 / 4485480 > > >>> iteratorCopy: 0.804068 =3D 11859111 / 14748894 > > >>> iteratorEquals: 1.59826 =3D 1388752 / 868913 > > >>> iteratorInc: 1.25659 =3D 1818421 / 1447111 > > >>> iteratorDec: 0.383738 =3D 537108 / 1399674 > > >>> iteratorDiff: 0.486488 =3D 562273 / 1155779 > > >>> iteratorSub: 0.0638659 =3D 6666 / 104375 > > >>> iteratorAdd: 0.184194 =3D 55586 / 301780 > > >>> iteratorAddAssign: 1.30369 =3D 1105439 / 847932 > > >>> iteratorArithmetic: 0.893672 =3D 5474245 / 6125564 > > >>> micros: 0.790603 =3D 233953 / 295917 > > >>> > > >>> Distribution: decreasing_unique(100000) > > >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, > 3, 2, 1, 0] > > >>> > > >>> move: 0.40597 =3D 995312 / 2451688 > > >>> moveAssign: 0.45785 =3D 2459840 / 5372592 > > >>> compare: 0.489247 =3D 151631 / 309927 > > >>> iteratorDeref: 0.524798 =3D 3415934 / 6509043 > > >>> iteratorCopy: 0.563628 =3D 5971275 / 10594360 > > >>> iteratorEquals: 1.18395 =3D 843533 / 712475 > > >>> iteratorInc: 0.864454 =3D 1460463 / 1689462 > > >>> iteratorDec: 0.373125 =3D 1606976 / 4306800 > > >>> iteratorDiff: 0.495226 =3D 235930 / 476409 > > >>> iteratorSub: 0.248254 =3D 13824 / 55685 > > >>> iteratorAdd: 0.590877 =3D 135839 / 229894 > > >>> iteratorAddAssign: 0.421852 =3D 119646 / 283621 > > >>> iteratorArithmetic: 0.569514 =3D 4416211 / 7754346 > > >>> micros: 0.543145 =3D 167865 / 309061 > > >>> > > >>> Distribution: decreasing_dupes(100000) > > >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, = 0] > > >>> > > >>> move: 0.741695 =3D 1455276 / 1962096 > > >>> moveAssign: 0.736048 =3D 2826598 / 3840238 > > >>> compare: 0.641679 =3D 282851 / 440798 > > >>> iteratorDeref: 0.740565 =3D 3386062 / 4572271 > > >>> iteratorCopy: 0.628798 =3D 5008472 / 7965150 > > >>> iteratorEquals: 1.50206 =3D 794354 / 528843 > > >>> iteratorInc: 1.39602 =3D 2164546 / 1550510 > > >>> iteratorDec: 0.336639 =3D 837728 / 2488504 > > >>> iteratorDiff: 0.313457 =3D 88552 / 282501 > > >>> iteratorSub: 0.165388 =3D 3880 / 23460 > > >>> iteratorAdd: 0.377644 =3D 27139 / 71864 > > >>> iteratorAddAssign: 0.551364 =3D 182383 / 330785 > > >>> iteratorArithmetic: 0.776766 =3D 4098582 / 5276467 > > >>> micros: 0.71282 =3D 160398 / 225019 > > >>> > > >>> Distribution: mostly_decreasing(100000) > > >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, = 7, > > >>> 6, 4, 3, 2] > > >>> > > >>> move: 0.760602 =3D 2197435 / 2889075 > > >>> moveAssign: 0.768473 =3D 4591313 / 5974593 > > >>> compare: 0.932913 =3D 977678 / 1047984 > > >>> iteratorDeref: 0.838641 =3D 6633169 / 7909424 > > >>> iteratorCopy: 0.829028 =3D 15447960 / 18633830 > > >>> iteratorEquals: 1.7009 =3D 2000686 / 1176251 > > >>> iteratorInc: 1.52771 =3D 3788203 / 2479657 > > >>> iteratorDec: 0.363843 =3D 1369990 / 3765332 > > >>> iteratorDiff: 0.489186 =3D 595472 / 1217270 > > >>> iteratorSub: 0.0877827 =3D 10925 / 124455 > > >>> iteratorAdd: 0.201991 =3D 69640 / 344767 > > >>> iteratorAddAssign: 1.32756 =3D 1133512 / 853830 > > >>> iteratorArithmetic: 0.900303 =3D 8968428 / 9961562 > > >>> micros: 0.846298 =3D 377504 / 446065 > > >>> > > >>> Distribution: sawtooth_unique(100000) > > >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, > 11, 14, 17] > > >>> > > >>> move: 0.569953 =3D 2912465 / 5110007 > > >>> moveAssign: 0.577701 =3D 6012427 / 10407511 > > >>> compare: 0.824488 =3D 1604387 / 1945920 > > >>> iteratorDeref: 0.678979 =3D 9299794 / 13696724 > > >>> iteratorCopy: 0.72546 =3D 26157352 / 36056250 > > >>> iteratorEquals: 2.00683 =3D 4131766 / 2058856 > > >>> iteratorInc: 1.10728 =3D 6498257 / 5868688 > > >>> iteratorDec: 0.111225 =3D 552519 / 4967557 > > >>> iteratorDiff: 0.405328 =3D 988144 / 2437889 > > >>> iteratorSub: 0.026249 =3D 6327 / 241038 > > >>> iteratorAdd: 0.089034 =3D 55918 / 628052 > > >>> iteratorAddAssign: 1.18737 =3D 2297233 / 1934717 > > >>> iteratorArithmetic: 0.801143 =3D 14530164 / 18136797 > > >>> micros: 0.706282 =3D 583937 / 826776 > > >>> > > >>> Distribution: sawtooth_dupes(100000) > > >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, = 3] > > >>> > > >>> move: 0.547721 =3D 2348873 / 4288447 > > >>> moveAssign: 0.543252 =3D 4613817 / 8492965 > > >>> compare: 0.664642 =3D 841503 / 1266099 > > >>> iteratorDeref: 0.599656 =3D 6292282 / 10493159 > > >>> iteratorCopy: 0.60663 =3D 15336816 / 25281976 > > >>> iteratorEquals: 2.24803 =3D 2859760 / 1272118 > > >>> iteratorInc: 1.07142 =3D 4999546 / 4666276 > > >>> iteratorDec: 0.0490551 =3D 205953 / 4198405 > > >>> iteratorDiff: 0.314518 =3D 473157 / 1504390 > > >>> iteratorSub: 0.0371951 =3D 5998 / 161258 > > >>> iteratorAdd: 0.0875684 =3D 34672 / 395942 > > >>> iteratorAddAssign: 0.900009 =3D 1233077 / 1370072 > > >>> iteratorArithmetic: 0.72316 =3D 9812163 / 13568461 > > >>> micros: 0.610891 =3D 353102 / 578011 > > >>> > > >>> Distribution: increasing_with_swap(100000) > > >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16= , > 17, 18, 0] > > >>> > > >>> move: 1 =3D 291782 / 291782 > > >>> moveAssign: 1 =3D 491779 / 491779 > > >>> compare: 0.57449 =3D 225064 / 391763 > > >>> iteratorDeref: 0.819553 =3D 941922 / 1149312 > > >>> iteratorCopy: 0.377964 =3D 1392386 / 3683915 > > >>> iteratorEquals: 0.525758 =3D 117408 / 223312 > > >>> iteratorInc: 0.863708 =3D 333050 / 385605 > > >>> iteratorDec: 0.999753 =3D 291716 / 291788 > > >>> iteratorDiff: 0.224408 =3D 42528 / 189512 > > >>> iteratorSub: 0.888889 =3D 104 / 117 > > >>> iteratorAdd: 0.997151 =3D 16801 / 16849 > > >>> iteratorAddAssign: 0.302604 =3D 82864 / 273836 > > >>> iteratorArithmetic: 0.640448 =3D 884471 / 1381019 > > >>> micros: 0.616124 =3D 36233 / 58808 > > >>> > > >>> Distribution: increasing_fast_slow(100000) > > >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, > 16, 17, 18] > > >>> > > >>> move: 0.640348 =3D 825772 / 1289568 > > >>> moveAssign: 0.627081 =3D 1559792 / 2487384 > > >>> compare: 0.714235 =3D 431286 / 603843 > > >>> iteratorDeref: 0.694913 =3D 2421875 / 3485151 > > >>> iteratorCopy: 0.637176 =3D 5645877 / 8860786 > > >>> iteratorEquals: 1.7552 =3D 953880 / 543458 > > >>> iteratorInc: 1.08941 =3D 1676609 / 1539005 > > >>> iteratorDec: 0.135904 =3D 154533 / 1137073 > > >>> iteratorDiff: 0.388132 =3D 192787 / 496705 > > >>> iteratorSub: 0.00577964 =3D 141 / 24396 > > >>> iteratorAdd: 0.207287 =3D 17272 / 83324 > > >>> iteratorAddAssign: 0.849015 =3D 431320 / 508024 > > >>> iteratorArithmetic: 0.790987 =3D 3426542 / 4331985 > > >>> micros: 0.777699 =3D 153552 / 197444 > > >>> > > >>> > > >>> > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> #include > > >>> > > >>> using namespace std; > > >>> > > >>> #define LINE0 0 > > >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > > >>> #define vout(v) #v << " =3D " << (v) > > >>> #define print1(v) cout_line << vout(v) << endl > > >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << > endl > > >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2= ) > > >>> << ", " << vout(v3) << endl > > >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME > > >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, > print1)(__VA_ARGS__) > > >>> > > >>> template > > >>> basic_ostream & operator << (basic_ostream & os, > > >>> Container const& x); > > >>> > > >>> template > > >>> ostream & operator << (ostream & os, pair const& p) { > > >>> return os << p.first << ":" << p.second; > > >>> } > > >>> > > >>> template > > >>> basic_ostream & operator << (basic_ostream & os, > > >>> Container const& x) { > > >>> os << "["; > > >>> bool first =3D true; > > >>> for(auto const& y : x) { > > >>> if (!first) os << ", "; > > >>> os << y; > > >>> first =3D false; > > >>> } > > >>> return os << "]"; > > >>> } > > >>> > > >>> > > >>> struct Counter { > > >>> int construct =3D 0; > > >>> int copy =3D 0; > > >>> int assign =3D 0; > > >>> int move =3D 0; > > >>> int moveAssign =3D 0; > > >>> int compare =3D 0; > > >>> int iteratorDeref =3D 0; > > >>> int iteratorCopy =3D 0; > > >>> int iteratorEquals =3D 0; > > >>> int iteratorInc =3D 0; > > >>> int iteratorDec =3D 0; > > >>> int iteratorDiff =3D 0; > > >>> int iteratorSub =3D 0; > > >>> int iteratorAdd =3D 0; > > >>> int iteratorAddAssign =3D 0; > > >>> int iteratorArithmetic =3D 0; > > >>> int micros =3D 0; > > >>> }; > > >>> > > >>> int next_element_id =3D 0; > > >>> > > >>> class Element { > > >>> Counter* counter; > > >>> public: > > >>> int val; > > >>> int id; // Distinguis identical values to verify stability. > > >>> Element(int val, Counter* counter) > > >>> : val(val), counter(counter), id(++next_element_id) { > > >>> ++counter->construct; > > >>> } > > >>> > > >>> Element(Element const& other) > > >>> : val(other.val), counter(other.counter), id(other.id) { > > >>> ++counter->copy; > > >>> } > > >>> > > >>> Element& operator=3D(Element const& other) { > > >>> val =3D other.val; > > >>> counter =3D other.counter; > > >>> id =3D other.id; > > >>> ++counter->assign; > > >>> return *this; > > >>> } > > >>> > > >>> Element(Element&& other) noexcept > > >>> : val(move(other.val)), counter(other.counter), id(other.id= ) > { > > >>> ++counter->move; > > >>> } > > >>> > > >>> Element& operator=3D(Element&& other) noexcept { > > >>> val =3D move(other.val); > > >>> counter =3D other.counter; > > >>> id =3D other.id; > > >>> ++counter->moveAssign; > > >>> return *this; > > >>> } > > >>> > > >>> bool operator<(Element const& other) const { > > >>> ++counter->compare; > > >>> return val < other.val; > > >>> } > > >>> > > >>> bool operator=3D=3D(Element const& other) const { > > >>> return val =3D=3D other.val && id =3D=3D other.id; > > >>> } > > >>> }; > > >>> > > >>> ostream & operator << (ostream & os, Element const& e) { > > >>> return os << e.val << "(" << e.id << ")"; > > >>> } > > >>> > > >>> > > >>> class RandomIterator { > > >>> typedef vector::iterator It; > > >>> It it; > > >>> Counter* counter; > > >>> > > >>> public: > > >>> using iterator_category =3D std::random_access_iterator_tag; > > >>> using difference_type =3D std::ptrdiff_t; > > >>> using value_type =3D Element; > > >>> using pointer =3D value_type*; > > >>> using reference =3D value_type&; > > >>> > > >>> RandomIterator(It it, Counter* counter) : it(it), > counter(counter) { > > >>> } > > >>> > > >>> RandomIterator(RandomIterator const& other) : it(other.it), > > >>> counter(other.counter) { > > >>> counter->iteratorCopy +=3D 1; > > >>> } > > >>> > > >>> bool operator=3D=3D (RandomIterator other) { > > >>> counter->iteratorEquals +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return it =3D=3D other.it; > > >>> } > > >>> > > >>> RandomIterator operator++ () { > > >>> counter->iteratorInc +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return RandomIterator(++it, counter); > > >>> } > > >>> > > >>> RandomIterator operator-- () { > > >>> counter->iteratorDec +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return RandomIterator(--it, counter); > > >>> } > > >>> > > >>> Element& operator* () { > > >>> counter->iteratorDeref +=3D 1; > > >>> return *it; > > >>> } > > >>> > > >>> difference_type operator- (RandomIterator other) { > > >>> counter->iteratorDiff +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return it - other.it; > > >>> } > > >>> > > >>> RandomIterator operator- (difference_type diff) { > > >>> counter->iteratorSub +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return RandomIterator(it - diff, counter); > > >>> } > > >>> > > >>> RandomIterator operator+ (difference_type diff) { > > >>> counter->iteratorAdd +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return RandomIterator(it + diff, counter); > > >>> } > > >>> > > >>> RandomIterator operator+=3D (difference_type diff) { > > >>> counter->iteratorAddAssign +=3D 1; > > >>> counter->iteratorArithmetic +=3D 1; > > >>> return RandomIterator(it +=3D diff, counter); > > >>> } > > >>> }; > > >>> > > >>> > > >>> class TestCase { > > >>> Counter counter; > > >>> vector elements; > > >>> > > >>> public: > > >>> TestCase(vector const& data) { > > >>> elements.reserve(data.size()); > > >>> for (int i: data) { > > >>> elements.push_back(Element(i, &counter)); > > >>> } > > >>> } > > >>> > > >>> Counter sort( > > >>> void algo(RandomIterator, RandomIterator), > > >>> vector::iterator begin, > > >>> vector::iterator end) { > > >>> counter =3D Counter(); > > >>> auto t0 =3D std::chrono::system_clock::now(); > > >>> > > >>> algo(RandomIterator(begin, &counter), RandomIterator(end, > &counter)); > > >>> > > >>> auto t1 =3D std::chrono::system_clock::now(); > > >>> counter.micros =3D > > >>> std::chrono::duration_cast(t1 - > > >>> t0).count(); > > >>> return counter; > > >>> } > > >>> > > >>> pair benchmark( > > >>> void new_algo(RandomIterator, RandomIterator), > > >>> void cur_algo(RandomIterator, RandomIterator)) { > > >>> vector toSort1 =3D elements; > > >>> Counter counter1 =3D sort(new_algo, toSort1.begin(), > toSort1.end()); > > >>> > > >>> vector toSort2 =3D elements; > > >>> Counter counter2 =3D sort(cur_algo, toSort2.begin(), > toSort2.end()); > > >>> > > >>> if (toSort1 !=3D toSort2) { > > >>> cout << "INCORRECT RESULT!" << endl; > > >>> exit(1); > > >>> } > > >>> > > >>> return make_pair(counter1, counter2); > > >>> } > > >>> }; > > >>> > > >>> > > >>> void relative_metric(string name, int m1, int m2) { > > >>> if (m1 =3D=3D 0 && m2 =3D=3D 0) return; > > >>> if (m2 =3D=3D 0) { > > >>> cout << name << ": inf =3D " << m1 << " / " << m2 << endl; > > >>> } else { > > >>> cout << name << ": " << m1 / (double) m2 << " =3D " << m1 << = " / > " > > >>> << m2 << endl; > > >>> } > > >>> } > > >>> > > >>> > > >>> void relative_report(Counter const& c1, Counter const& c2) { > > >>> cout << endl; > > >>> relative_metric("construct", c1.construct, c2.construct); > > >>> relative_metric("copy", c1.copy, c2.copy); > > >>> relative_metric("assign", c1.assign, c2.assign); > > >>> relative_metric("move", c1.move, c2.move); > > >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > > >>> relative_metric("compare", c1.compare, c2.compare); > > >>> relative_metric("iteratorDeref", c1.iteratorDeref, > c2.iteratorDeref); > > >>> relative_metric("iteratorCopy", c1.iteratorCopy, > c2.iteratorCopy); > > >>> relative_metric("iteratorEquals", c1.iteratorEquals, > c2.iteratorEquals); > > >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > > >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > > >>> relative_metric("iteratorDiff", c1.iteratorDiff, > c2.iteratorDiff); > > >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > > >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > > >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > > >>> c2.iteratorAddAssign); > > >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > > >>> c2.iteratorArithmetic); > > >>> relative_metric("micros", c1.micros, c2.micros); > > >>> cout << endl; > > >>> } > > >>> > > >>> > > >>> /// This is a helper function for the merge routines. > > >>> template typename _Compare> > > >>> void _my_merge_without_buffer( > > >>> _BidirectionalIterator __first, > > >>> _BidirectionalIterator __middle, > > >>> _BidirectionalIterator __last, > > >>> _Distance __len1, _Distance __len2, > > >>> _Compare __comp) { > > >>> > > >>> if (__len1 =3D=3D 0 || __len2 =3D=3D 0) > > >>> return; > > >>> > > >>> if (__len1 + __len2 =3D=3D 2) { > > >>> if (__comp(__middle, __first)) > > >>> std::iter_swap(__first, __middle); > > >>> return; > > >>> } > > >>> > > >>> _BidirectionalIterator __first_cut =3D __first; > > >>> _BidirectionalIterator __second_cut =3D __middle; > > >>> _Distance __len11 =3D 0; > > >>> _Distance __len22 =3D 0; > > >>> > > >>> > > >>> #define NEW_ALGO 1 > > >>> > > >>> #ifdef NEW_ALGO > > >>> // Prevent stack overflow > > >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > > >>> #endif > > >>> if (__len1 > __len2) > > >>> { > > >>> __len11 =3D __len1 / 2; > > >>> std::advance(__first_cut, __len11); > > >>> __second_cut > > >>> =3D std::__lower_bound(__middle, __last, *__first_cut, > > >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); > > >>> __len22 =3D std::distance(__middle, __second_cut); > > >>> } > > >>> else > > >>> { > > >>> __len22 =3D __len2 / 2; > > >>> std::advance(__second_cut, __len22); > > >>> __first_cut > > >>> =3D std::__upper_bound(__first, __middle, *__second_cut= , > > >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); > > >>> __len11 =3D std::distance(__first, __first_cut); > > >>> } > > >>> #ifdef NEW_ALGO > > >>> } else > > >>> { > > >>> _Distance __cut_range; > > >>> // __cut_len is always equal to __middle - __first_cut (and > eventually to > > >>> // __second_cut - __middle) but we keep track of it to avoid > calculating it > > >>> // later which can be expensive for bidirectional iterators. > > >>> _Distance __cut_len; > > >>> > > >>> if (__len1 > __len2) > > >>> { > > >>> std::advance(__first_cut, __len1 - __len2); > > >>> __cut_range =3D __len2; > > >>> } > > >>> else > > >>> { > > >>> __cut_range =3D __len1; > > >>> } > > >>> __cut_len =3D __cut_range; > > >>> > > >>> while (__cut_range > 0) { > > >>> _Distance __half_cut_range =3D __cut_range / 2; > > >>> _BidirectionalIterator __mid_first_cut =3D __first_cut; > > >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > > >>> _BidirectionalIterator __mid_second_cut =3D __second_cut; > > >>> std::advance(__mid_second_cut, __half_cut_range); > > >>> > > >>> if (__comp(__mid_second_cut, __mid_first_cut)) { > > >>> __cut_range -=3D __half_cut_range + 1; > > >>> __second_cut =3D __mid_second_cut; > > >>> ++__second_cut; > > >>> } else { > > >>> __cut_len -=3D __cut_range - __half_cut_range; > > >>> __cut_range =3D __half_cut_range; > > >>> __first_cut =3D __mid_first_cut; > > >>> ++__first_cut; > > >>> } > > >>> } > > >>> __len22 =3D __cut_len; > > >>> __len11 =3D __len1 - __cut_len; > > >>> } > > >>> #endif > > >>> > > >>> _BidirectionalIterator __new_middle > > >>> =3D std::rotate(__first_cut, __middle, __second_cut); > > >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, > > >>> __len11, __len22, __comp); > > >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, > > >>> __len1 - __len11, __len2 - __len22, __comp); > > >>> } > > >>> > > >>> /// This is a helper function for the stable sorting routines. > > >>> template > > >>> void > > >>> _my_inplace_stable_sort(_RandomAccessIterator __first, > > >>> _RandomAccessIterator __last, _Compare __comp) > > >>> { > > >>> > > >>> if (__last - __first < 15) > > >>> { > > >>> std::__insertion_sort(__first, __last, __comp); > > >>> return; > > >>> } > > >>> _RandomAccessIterator __middle =3D __first + (__last - __firs= t) > / 2; > > >>> _my_inplace_stable_sort(__first, __middle, __comp); > > >>> _my_inplace_stable_sort(__middle, __last, __comp); > > >>> _my_merge_without_buffer(__first, __middle, __last, > > >>> __middle - __first, > > >>> __last - __middle, > > >>> __comp); > > >>> } > > >>> > > >>> > > >>> template > > >>> void my_inplace_stable_sort(Iterator start, Iterator end) { > > >>> _my_inplace_stable_sort(start, end, > __gnu_cxx::__ops::__iter_less_iter()); > > >>> } > > >>> > > >>> template > > >>> void std_inplace_stable_sort(Iterator start, Iterator end) { > > >>> std::__inplace_stable_sort(start, end, > __gnu_cxx::__ops::__iter_less_iter()); > > >>> } > > >>> > > >>> void benchmark(string const& dist_name, > > >>> vector dist_func(int), > > >>> int n) { > > >>> vector v =3D dist_func(n); > > >>> cout << "Distribution: " << dist_name << "(" << n << ")" << end= l; > > >>> cout << "Example: " << dist_func(20) << endl; > > >>> TestCase testCase(v); > > >>> > > >>> pair counter_pair =3D > > >>> testCase.benchmark( > > >>> my_inplace_stable_sort > > >>> , std_inplace_stable_sort > > >>> ); > > >>> Counter cKloet =3D counter_pair.first; > > >>> Counter cInplace =3D counter_pair.second; > > >>> > > >>> relative_report(cKloet, cInplace); > > >>> } > > >>> > > >>> > > >>> vector random_unique(int n) { > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i; > > >>> } > > >>> for (int i =3D n - 1; i >=3D 0; --i) { > > >>> swap(v[i], v[rand() % (i + 1)]); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector random_dupes(int n) { > > >>> vector v(n); > > >>> int r =3D (int)sqrt(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i / r; > > >>> } > > >>> for (int i =3D n - 1; i >=3D 0; --i) { > > >>> swap(v[i], v[rand() % (i + 1)]); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector increasing_unique(int n) { > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector increasing_dupes(int n) { > > >>> vector v(n); > > >>> int r =3D (int)sqrt(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i / r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector mostly_increasing(int n) { > > >>> vector v(n); > > >>> int r =3D (int)sqrt(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i + rand() % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> vector decreasing_unique(int n) { > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D n - 1 - i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector decreasing_dupes(int n) { > > >>> vector v(n); > > >>> int r =3D (int)sqrt(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D (n - 1 - i) / r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector mostly_decreasing(int n) { > > >>> vector v(n); > > >>> int r =3D (int)sqrt(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D n - 1 - i + rand() % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector sawtooth_unique(int n) { > > >>> int r =3D (int)sqrt(n); > > >>> while (gcd(r, n) !=3D 1) ++r; > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[(i * r) % n] =3D i; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector sawtooth_dupes(int n) { > > >>> int r =3D (int)sqrt(n); > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i % r; > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> vector increasing_with_swap(int n) { > > >>> vector v(n); > > >>> for (int i =3D 0; i < n; ++i) { > > >>> v[i] =3D i; > > >>> } > > >>> swap(v[0], v[n - 1]); > > >>> return v; > > >>> } > > >>> > > >>> vector increasing_fast_slow(int n) { > > >>> int r =3D (int)sqrt(n); > > >>> vector v(n); > > >>> for (int i =3D 0; i < r; ++i) { > > >>> v[i] =3D i * r; > > >>> } > > >>> for (int i =3D r; i < n; ++i) { > > >>> v[i] =3D (i - r) * n / (n - r); > > >>> } > > >>> return v; > > >>> } > > >>> > > >>> > > >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > > >>> > > >>> int main() { > > >>> int n =3D 100000; > > >>> BENCHMARK(random_unique, n); > > >>> BENCHMARK(random_dupes, n); > > >>> BENCHMARK(increasing_unique, n); > > >>> BENCHMARK(increasing_dupes, n); > > >>> BENCHMARK(mostly_increasing, n); > > >>> BENCHMARK(decreasing_unique, n); > > >>> BENCHMARK(decreasing_dupes, n); > > >>> BENCHMARK(mostly_decreasing, n); > > >>> BENCHMARK(sawtooth_unique, n); > > >>> BENCHMARK(sawtooth_dupes, n); > > >>> BENCHMARK(increasing_with_swap, n); > > >>> BENCHMARK(increasing_fast_slow, n); > > >>> > > >>> return 0; > > >>> } > > >> > > >