From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id 881F63858D3C for ; Thu, 25 Nov 2021 19:51:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 881F63858D3C Received: by mail-ot1-x333.google.com with SMTP id b5-20020a9d60c5000000b0055c6349ff22so10839565otk.13 for ; Thu, 25 Nov 2021 11:51:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Ak5sahN2XTcPJhLgcj/JqtZznYbulHn4OJ39vMKPuHQ=; b=U6U9PKGwRIdSoiizvEOeardutTZzVzl05e55gyijUW4PuEO0HCsaD176UzVF8Mj2DF xOB/C2TaqvPr1E/600t1vQxDRAQrgw4M+CQy28GMe2FLd/XghEb9BpCLiVKQil8489wE 4PtTL6i12S4QBg2itPoC3KJ+wxFbtTcmlz4GBdM3fgnr72oP21tIq+UfV0O2b1kDuBQp oyZu5TA+Ms72EeABrIMOM6fomxhq8rzBcoAiIkOFcO2geXwoXjfnAL9tnMMFDyMvWbFs LABWDDknU05D21SOg4eJ2pTHt261AvLYO34F8XEGrS6M8Z3ca1KvzP+LBq7qEGEFOiyu axQg== X-Gm-Message-State: AOAM531xC0rzVIZ6MmlE2ECwmQFAfiHOF+uheCyvbrWEVzs2ztO5I4KD Qr9OLvLyVPMjEdzpzeLOIyxEVMyKDhlZBgr2MPo0Ne+r X-Google-Smtp-Source: ABdhPJzwF5DoR+YMjNK891jw/QUa+qpw8MIji6gAd2jG5RBJ4s7zka4+5tYX8m+vZriEuZRRxcj8Tr++5PH+HYyhwpA= X-Received: by 2002:a9d:70d4:: with SMTP id w20mr24240234otj.154.1637869893327; Thu, 25 Nov 2021 11:51:33 -0800 (PST) MIME-Version: 1.0 References: <9f7342af-398c-fa0f-4182-761644d6e5f3@gmail.com> <54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com> In-Reply-To: <54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com> From: David de Kloet Date: Thu, 25 Nov 2021 20:51:22 +0100 Message-ID: Subject: Re: __inplace_stable_sort improvement To: =?UTF-8?Q?Fran=C3=A7ois_Dumont?= Cc: libstdc++@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 19:51:38 -0000 On Thu, Nov 25, 2021 at 6:35 AM Fran=C3=A7ois Dumont = wrote: > > On 24/11/21 7:48 pm, David de Kloet wrote: > > On Wed, Nov 24, 2021 at 7:33 PM Fran=C3=A7ois Dumont wrote: > >> > >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: > >>> Hello libstdc++, > >>> > >>> Apologies if this is the wrong list for my message. I believe I have = an > >>> improvement for __inplace_stable_sort and I'm wondering if this is wo= rth > >>> pursuing and if so how to proceed. > >> Of course, anyone interested in enhancing the library is welcome. > >> > >> The best way to do so would be to provide a patch on the current libra= ry > >> code. I'm the last one to have touch to this algo and it would be easi= er > >> for me to understand if you try to implement it directly on the librar= y. > > _my_merge_without_buffer below preprocesses exactly to > > __merge_without_buffer from the library if you undefine NEW_ALGO so > > you can see exactly what my changes are. > > > > I currently don't have much time, but when I do I'd be happy to > > provide it as a patch. If you find time before me, you're also more > > than welcome to use the code. > > > >> The good news is that you are trying to enhance a piece of code rarely > >> used as it is only when the buffer allocation fails. So it is less > >> critical and so perhaps more open to modifications. > >> > >> During my last work on it I've added a performance test: > >> performance/25_algorithms/inplace_merge.cc. It is also supposed to che= ck > >> performance when we fail to get a buffer and so use > >> __merge_without_buffer. Use 'make check-performance' to run it, see > >> library manual guide. Do not hesitate to change it if you think that > >> your benches are better. > > I don't currently have the environment to `make check-performance`. I > > just copied the code from the web and adjusted it. But (again) when I > > have time, I'll look into that. > > > > If you know of any "contributing to stdlibc++" guides, that would be gr= eat too. > > https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.htm= l > > > > > >>> __inplace_stable_sort does a merge sort where in each recursive step = the 2 > >>> sublists are merged in-place. The merge step is itself also recursive= . In order > >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head se= ction S2 > >>> of [B, C) are found such that all elements in S1 are greater than all= elements > >>> in S2 and then those sections are swapped/rotated and the shorter sec= tions are > >>> merged recursively. This is done by the function __merge_without_buff= er. > >>> > >>> In the current implementation, __merge_without_buffer takes the half = point of > >>> the longer section and does a binary search (via lower_bound or upper= _bound) to > >>> find the corresponding cut point on the other section. Then it rotate= s the > >>> sections. > >>> > >>> My proposal is to find the cut points such that both rotated sections= have the > >>> same length such that they can always simply be swapped. > >> What do you mean by swapped ? > > I mean the first element of section 1 gets swapped with the first > > element of section 2. The second element of section 1 gets swapped > > with the second element of section 2. Etc. for each index. > > > >> I see in the code below that you are just > >> changing how the cut positions are computed and then call the same > >> std::rotate algo. I cannot see any reason why changing the position > >> would make the rotate faster. > > The implementation of rotate does a series of such section swaps. But > > if both section have the same length, it's only a single swap of > > sections which makes rotate about 1.5-2 times faster. > > Ok, I see now, you try to use as much as possible the std::swap_ranges > in __rotate for random access iterator. Maybe it means that this > optimization should be limited to this type of iterator. We'll see > depending on benches. The binary search implementation is a bit awkward for the bidi-iterator, so having a separate implementation for random access iterators might improve the readability. However, if you look at my benchmarks, it also does fewer comparisons than the current algorithm. But rotate() doesn't do any comparisons. So there must be another benefit to choosing the cut points this way, but I'm not entirely sure why. > > I'll have a try then when I have time. > > Thanks > > > > > The purpose of the code I added is to choose the cut positions such > > that the rotated sections are of equal length, which makes rotate > > faster by doing fewer element swaps, resulting in fewer calls to the > > copy/assignment constructor. > > > >>> I have implemented > >>> this and it seems to perform better in general and in the special cas= es that > >>> I've tried. > >>> > >>> Unfortunately, since we are no longer guaranteed to halve the longer = section in > >>> each step, there is a possibility for a stack overflow. To avoid this= , the new > >>> algorithm falls back to the old behavior whenever the sections have > >>> significantly different lengths. > >>> > >>> Also, where the current algorithm performs the binary search implicit= ly via > >>> lower_bound/upper_bound, the new algorithm has to implement the binar= y search > >>> explicitly because it has to look at both sections to determine the n= ext step > >>> in the search. > >>> > >>> I have created wrappers for my value objects as well as my iterators = to count > >>> the number of operations performed by the current and my algorithm an= d run > >>> benchmarks on various distributions. My code is at the bottom but bel= ow is the > >>> output as run on my machine. For each operation it outputs the ratio = between > >>> the number of operations performed by my algorithm over the number pe= rformed by > >>> the current algorithm. > >>> > >>> Thanks for your attention, > >>> David > >>> > >>> > >>> Distribution: random_unique(100000) > >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8,= 12, 3, 2] > >>> > >>> move: 0.588026 =3D 3331848 / 5666154 > >>> moveAssign: 0.594789 =3D 6852819 / 11521431 > >>> compare: 0.914997 =3D 2028977 / 2217469 > >>> iteratorDeref: 0.718466 =3D 11001863 / 15312994 > >>> iteratorCopy: 0.791332 =3D 33977189 / 42936712 > >>> iteratorEquals: 2.27841 =3D 5174606 / 2271143 > >>> iteratorInc: 1.2143 =3D 7697971 / 6339449 > >>> iteratorDec: 0.0833529 =3D 473556 / 5681340 > >>> iteratorDiff: 0.444865 =3D 1362598 / 3062950 > >>> iteratorSub: 0.0200683 =3D 6272 / 312533 > >>> iteratorAdd: 0.0695467 =3D 57149 / 821736 > >>> iteratorAddAssign: 1.36491 =3D 3141949 / 2301950 > >>> iteratorArithmetic: 0.861623 =3D 17914101 / 20791101 > >>> micros: 0.663704 =3D 650825 / 980596 > >>> > >>> Distribution: random_dupes(100000) > >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] > >>> > >>> move: 0.570041 =3D 2949623 / 5174401 > >>> moveAssign: 0.577736 =3D 6087819 / 10537375 > >>> compare: 0.845419 =3D 1530877 / 1810790 > >>> iteratorDeref: 0.678233 =3D 9232869 / 13613114 > >>> iteratorCopy: 0.742699 =3D 25652244 / 34539225 > >>> iteratorEquals: 2.41643 =3D 4123881 / 1706601 > >>> iteratorInc: 1.15617 =3D 6554374 / 5669028 > >>> iteratorDec: 0.101004 =3D 539991 / 5346219 > >>> iteratorDiff: 0.417093 =3D 959972 / 2301580 > >>> iteratorSub: 0.0350046 =3D 8882 / 253738 > >>> iteratorAdd: 0.0972571 =3D 64122 / 659304 > >>> iteratorAddAssign: 1.2279 =3D 2247197 / 1830107 > >>> iteratorArithmetic: 0.81605 =3D 14498419 / 17766577 > >>> micros: 0.699386 =3D 554426 / 792732 > >>> > >>> Distribution: increasing_unique(100000) > >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1= 7, 18, 19] > >>> > >>> move: 1 =3D 91808 / 91808 > >>> moveAssign: 1 =3D 91808 / 91808 > >>> compare: 0.573591 =3D 224558 / 391495 > >>> iteratorDeref: 0.722385 =3D 540924 / 748803 > >>> iteratorCopy: 0.298175 =3D 976735 / 3275708 > >>> iteratorEquals: 0.522302 =3D 116383 / 222827 > >>> iteratorInc: 0.715444 =3D 132750 / 185549 > >>> iteratorDec: 1 =3D 91808 / 91808 > >>> iteratorDiff: 0.217455 =3D 40956 / 188342 > >>> iteratorAdd: 1 =3D 16383 / 16383 > >>> iteratorAddAssign: 0.299523 =3D 81884 / 273381 > >>> iteratorArithmetic: 0.49082 =3D 480164 / 978290 > >>> micros: 0.479792 =3D 22935 / 47802 > >>> > >>> Distribution: increasing_dupes(100000) > >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] > >>> > >>> move: 1 =3D 91808 / 91808 > >>> moveAssign: 1 =3D 91808 / 91808 > >>> compare: 0.573591 =3D 224558 / 391495 > >>> iteratorDeref: 0.722385 =3D 540924 / 748803 > >>> iteratorCopy: 0.298175 =3D 976735 / 3275708 > >>> iteratorEquals: 0.522302 =3D 116383 / 222827 > >>> iteratorInc: 0.715444 =3D 132750 / 185549 > >>> iteratorDec: 1 =3D 91808 / 91808 > >>> iteratorDiff: 0.217455 =3D 40956 / 188342 > >>> iteratorAdd: 1 =3D 16383 / 16383 > >>> iteratorAddAssign: 0.299523 =3D 81884 / 273381 > >>> iteratorArithmetic: 0.49082 =3D 480164 / 978290 > >>> micros: 0.449154 =3D 20675 / 46031 > >>> > >>> Distribution: mostly_increasing(100000) > >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 1= 8, 21, 20] > >>> > >>> move: 0.681093 =3D 814725 / 1196203 > >>> moveAssign: 0.703421 =3D 1809563 / 2572519 > >>> compare: 0.924016 =3D 965320 / 1044700 > >>> iteratorDeref: 0.847765 =3D 3802634 / 4485480 > >>> iteratorCopy: 0.804068 =3D 11859111 / 14748894 > >>> iteratorEquals: 1.59826 =3D 1388752 / 868913 > >>> iteratorInc: 1.25659 =3D 1818421 / 1447111 > >>> iteratorDec: 0.383738 =3D 537108 / 1399674 > >>> iteratorDiff: 0.486488 =3D 562273 / 1155779 > >>> iteratorSub: 0.0638659 =3D 6666 / 104375 > >>> iteratorAdd: 0.184194 =3D 55586 / 301780 > >>> iteratorAddAssign: 1.30369 =3D 1105439 / 847932 > >>> iteratorArithmetic: 0.893672 =3D 5474245 / 6125564 > >>> micros: 0.790603 =3D 233953 / 295917 > >>> > >>> Distribution: decreasing_unique(100000) > >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3= , 2, 1, 0] > >>> > >>> move: 0.40597 =3D 995312 / 2451688 > >>> moveAssign: 0.45785 =3D 2459840 / 5372592 > >>> compare: 0.489247 =3D 151631 / 309927 > >>> iteratorDeref: 0.524798 =3D 3415934 / 6509043 > >>> iteratorCopy: 0.563628 =3D 5971275 / 10594360 > >>> iteratorEquals: 1.18395 =3D 843533 / 712475 > >>> iteratorInc: 0.864454 =3D 1460463 / 1689462 > >>> iteratorDec: 0.373125 =3D 1606976 / 4306800 > >>> iteratorDiff: 0.495226 =3D 235930 / 476409 > >>> iteratorSub: 0.248254 =3D 13824 / 55685 > >>> iteratorAdd: 0.590877 =3D 135839 / 229894 > >>> iteratorAddAssign: 0.421852 =3D 119646 / 283621 > >>> iteratorArithmetic: 0.569514 =3D 4416211 / 7754346 > >>> micros: 0.543145 =3D 167865 / 309061 > >>> > >>> Distribution: decreasing_dupes(100000) > >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] > >>> > >>> move: 0.741695 =3D 1455276 / 1962096 > >>> moveAssign: 0.736048 =3D 2826598 / 3840238 > >>> compare: 0.641679 =3D 282851 / 440798 > >>> iteratorDeref: 0.740565 =3D 3386062 / 4572271 > >>> iteratorCopy: 0.628798 =3D 5008472 / 7965150 > >>> iteratorEquals: 1.50206 =3D 794354 / 528843 > >>> iteratorInc: 1.39602 =3D 2164546 / 1550510 > >>> iteratorDec: 0.336639 =3D 837728 / 2488504 > >>> iteratorDiff: 0.313457 =3D 88552 / 282501 > >>> iteratorSub: 0.165388 =3D 3880 / 23460 > >>> iteratorAdd: 0.377644 =3D 27139 / 71864 > >>> iteratorAddAssign: 0.551364 =3D 182383 / 330785 > >>> iteratorArithmetic: 0.776766 =3D 4098582 / 5276467 > >>> micros: 0.71282 =3D 160398 / 225019 > >>> > >>> Distribution: mostly_decreasing(100000) > >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, > >>> 6, 4, 3, 2] > >>> > >>> move: 0.760602 =3D 2197435 / 2889075 > >>> moveAssign: 0.768473 =3D 4591313 / 5974593 > >>> compare: 0.932913 =3D 977678 / 1047984 > >>> iteratorDeref: 0.838641 =3D 6633169 / 7909424 > >>> iteratorCopy: 0.829028 =3D 15447960 / 18633830 > >>> iteratorEquals: 1.7009 =3D 2000686 / 1176251 > >>> iteratorInc: 1.52771 =3D 3788203 / 2479657 > >>> iteratorDec: 0.363843 =3D 1369990 / 3765332 > >>> iteratorDiff: 0.489186 =3D 595472 / 1217270 > >>> iteratorSub: 0.0877827 =3D 10925 / 124455 > >>> iteratorAdd: 0.201991 =3D 69640 / 344767 > >>> iteratorAddAssign: 1.32756 =3D 1133512 / 853830 > >>> iteratorArithmetic: 0.900303 =3D 8968428 / 9961562 > >>> micros: 0.846298 =3D 377504 / 446065 > >>> > >>> Distribution: sawtooth_unique(100000) > >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 1= 1, 14, 17] > >>> > >>> move: 0.569953 =3D 2912465 / 5110007 > >>> moveAssign: 0.577701 =3D 6012427 / 10407511 > >>> compare: 0.824488 =3D 1604387 / 1945920 > >>> iteratorDeref: 0.678979 =3D 9299794 / 13696724 > >>> iteratorCopy: 0.72546 =3D 26157352 / 36056250 > >>> iteratorEquals: 2.00683 =3D 4131766 / 2058856 > >>> iteratorInc: 1.10728 =3D 6498257 / 5868688 > >>> iteratorDec: 0.111225 =3D 552519 / 4967557 > >>> iteratorDiff: 0.405328 =3D 988144 / 2437889 > >>> iteratorSub: 0.026249 =3D 6327 / 241038 > >>> iteratorAdd: 0.089034 =3D 55918 / 628052 > >>> iteratorAddAssign: 1.18737 =3D 2297233 / 1934717 > >>> iteratorArithmetic: 0.801143 =3D 14530164 / 18136797 > >>> micros: 0.706282 =3D 583937 / 826776 > >>> > >>> Distribution: sawtooth_dupes(100000) > >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] > >>> > >>> move: 0.547721 =3D 2348873 / 4288447 > >>> moveAssign: 0.543252 =3D 4613817 / 8492965 > >>> compare: 0.664642 =3D 841503 / 1266099 > >>> iteratorDeref: 0.599656 =3D 6292282 / 10493159 > >>> iteratorCopy: 0.60663 =3D 15336816 / 25281976 > >>> iteratorEquals: 2.24803 =3D 2859760 / 1272118 > >>> iteratorInc: 1.07142 =3D 4999546 / 4666276 > >>> iteratorDec: 0.0490551 =3D 205953 / 4198405 > >>> iteratorDiff: 0.314518 =3D 473157 / 1504390 > >>> iteratorSub: 0.0371951 =3D 5998 / 161258 > >>> iteratorAdd: 0.0875684 =3D 34672 / 395942 > >>> iteratorAddAssign: 0.900009 =3D 1233077 / 1370072 > >>> iteratorArithmetic: 0.72316 =3D 9812163 / 13568461 > >>> micros: 0.610891 =3D 353102 / 578011 > >>> > >>> Distribution: increasing_with_swap(100000) > >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, = 17, 18, 0] > >>> > >>> move: 1 =3D 291782 / 291782 > >>> moveAssign: 1 =3D 491779 / 491779 > >>> compare: 0.57449 =3D 225064 / 391763 > >>> iteratorDeref: 0.819553 =3D 941922 / 1149312 > >>> iteratorCopy: 0.377964 =3D 1392386 / 3683915 > >>> iteratorEquals: 0.525758 =3D 117408 / 223312 > >>> iteratorInc: 0.863708 =3D 333050 / 385605 > >>> iteratorDec: 0.999753 =3D 291716 / 291788 > >>> iteratorDiff: 0.224408 =3D 42528 / 189512 > >>> iteratorSub: 0.888889 =3D 104 / 117 > >>> iteratorAdd: 0.997151 =3D 16801 / 16849 > >>> iteratorAddAssign: 0.302604 =3D 82864 / 273836 > >>> iteratorArithmetic: 0.640448 =3D 884471 / 1381019 > >>> micros: 0.616124 =3D 36233 / 58808 > >>> > >>> Distribution: increasing_fast_slow(100000) > >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16= , 17, 18] > >>> > >>> move: 0.640348 =3D 825772 / 1289568 > >>> moveAssign: 0.627081 =3D 1559792 / 2487384 > >>> compare: 0.714235 =3D 431286 / 603843 > >>> iteratorDeref: 0.694913 =3D 2421875 / 3485151 > >>> iteratorCopy: 0.637176 =3D 5645877 / 8860786 > >>> iteratorEquals: 1.7552 =3D 953880 / 543458 > >>> iteratorInc: 1.08941 =3D 1676609 / 1539005 > >>> iteratorDec: 0.135904 =3D 154533 / 1137073 > >>> iteratorDiff: 0.388132 =3D 192787 / 496705 > >>> iteratorSub: 0.00577964 =3D 141 / 24396 > >>> iteratorAdd: 0.207287 =3D 17272 / 83324 > >>> iteratorAddAssign: 0.849015 =3D 431320 / 508024 > >>> iteratorArithmetic: 0.790987 =3D 3426542 / 4331985 > >>> micros: 0.777699 =3D 153552 / 197444 > >>> > >>> > >>> > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> #include > >>> > >>> using namespace std; > >>> > >>> #define LINE0 0 > >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") > >>> #define vout(v) #v << " =3D " << (v) > >>> #define print1(v) cout_line << vout(v) << endl > >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << e= ndl > >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) > >>> << ", " << vout(v3) << endl > >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME > >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__V= A_ARGS__) > >>> > >>> template > >>> basic_ostream & operator << (basic_ostream & os, > >>> Container const& x); > >>> > >>> template > >>> ostream & operator << (ostream & os, pair const& p) { > >>> return os << p.first << ":" << p.second; > >>> } > >>> > >>> template > >>> basic_ostream & operator << (basic_ostream & os, > >>> Container const& x) { > >>> os << "["; > >>> bool first =3D true; > >>> for(auto const& y : x) { > >>> if (!first) os << ", "; > >>> os << y; > >>> first =3D false; > >>> } > >>> return os << "]"; > >>> } > >>> > >>> > >>> struct Counter { > >>> int construct =3D 0; > >>> int copy =3D 0; > >>> int assign =3D 0; > >>> int move =3D 0; > >>> int moveAssign =3D 0; > >>> int compare =3D 0; > >>> int iteratorDeref =3D 0; > >>> int iteratorCopy =3D 0; > >>> int iteratorEquals =3D 0; > >>> int iteratorInc =3D 0; > >>> int iteratorDec =3D 0; > >>> int iteratorDiff =3D 0; > >>> int iteratorSub =3D 0; > >>> int iteratorAdd =3D 0; > >>> int iteratorAddAssign =3D 0; > >>> int iteratorArithmetic =3D 0; > >>> int micros =3D 0; > >>> }; > >>> > >>> int next_element_id =3D 0; > >>> > >>> class Element { > >>> Counter* counter; > >>> public: > >>> int val; > >>> int id; // Distinguis identical values to verify stability. > >>> Element(int val, Counter* counter) > >>> : val(val), counter(counter), id(++next_element_id) { > >>> ++counter->construct; > >>> } > >>> > >>> Element(Element const& other) > >>> : val(other.val), counter(other.counter), id(other.id) { > >>> ++counter->copy; > >>> } > >>> > >>> Element& operator=3D(Element const& other) { > >>> val =3D other.val; > >>> counter =3D other.counter; > >>> id =3D other.id; > >>> ++counter->assign; > >>> return *this; > >>> } > >>> > >>> Element(Element&& other) noexcept > >>> : val(move(other.val)), counter(other.counter), id(other.id) = { > >>> ++counter->move; > >>> } > >>> > >>> Element& operator=3D(Element&& other) noexcept { > >>> val =3D move(other.val); > >>> counter =3D other.counter; > >>> id =3D other.id; > >>> ++counter->moveAssign; > >>> return *this; > >>> } > >>> > >>> bool operator<(Element const& other) const { > >>> ++counter->compare; > >>> return val < other.val; > >>> } > >>> > >>> bool operator=3D=3D(Element const& other) const { > >>> return val =3D=3D other.val && id =3D=3D other.id; > >>> } > >>> }; > >>> > >>> ostream & operator << (ostream & os, Element const& e) { > >>> return os << e.val << "(" << e.id << ")"; > >>> } > >>> > >>> > >>> class RandomIterator { > >>> typedef vector::iterator It; > >>> It it; > >>> Counter* counter; > >>> > >>> public: > >>> using iterator_category =3D std::random_access_iterator_tag; > >>> using difference_type =3D std::ptrdiff_t; > >>> using value_type =3D Element; > >>> using pointer =3D value_type*; > >>> using reference =3D value_type&; > >>> > >>> RandomIterator(It it, Counter* counter) : it(it), counter(counter= ) { > >>> } > >>> > >>> RandomIterator(RandomIterator const& other) : it(other.it), > >>> counter(other.counter) { > >>> counter->iteratorCopy +=3D 1; > >>> } > >>> > >>> bool operator=3D=3D (RandomIterator other) { > >>> counter->iteratorEquals +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return it =3D=3D other.it; > >>> } > >>> > >>> RandomIterator operator++ () { > >>> counter->iteratorInc +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return RandomIterator(++it, counter); > >>> } > >>> > >>> RandomIterator operator-- () { > >>> counter->iteratorDec +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return RandomIterator(--it, counter); > >>> } > >>> > >>> Element& operator* () { > >>> counter->iteratorDeref +=3D 1; > >>> return *it; > >>> } > >>> > >>> difference_type operator- (RandomIterator other) { > >>> counter->iteratorDiff +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return it - other.it; > >>> } > >>> > >>> RandomIterator operator- (difference_type diff) { > >>> counter->iteratorSub +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return RandomIterator(it - diff, counter); > >>> } > >>> > >>> RandomIterator operator+ (difference_type diff) { > >>> counter->iteratorAdd +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return RandomIterator(it + diff, counter); > >>> } > >>> > >>> RandomIterator operator+=3D (difference_type diff) { > >>> counter->iteratorAddAssign +=3D 1; > >>> counter->iteratorArithmetic +=3D 1; > >>> return RandomIterator(it +=3D diff, counter); > >>> } > >>> }; > >>> > >>> > >>> class TestCase { > >>> Counter counter; > >>> vector elements; > >>> > >>> public: > >>> TestCase(vector const& data) { > >>> elements.reserve(data.size()); > >>> for (int i: data) { > >>> elements.push_back(Element(i, &counter)); > >>> } > >>> } > >>> > >>> Counter sort( > >>> void algo(RandomIterator, RandomIterator), > >>> vector::iterator begin, > >>> vector::iterator end) { > >>> counter =3D Counter(); > >>> auto t0 =3D std::chrono::system_clock::now(); > >>> > >>> algo(RandomIterator(begin, &counter), RandomIterator(end, &coun= ter)); > >>> > >>> auto t1 =3D std::chrono::system_clock::now(); > >>> counter.micros =3D > >>> std::chrono::duration_cast(t1 - > >>> t0).count(); > >>> return counter; > >>> } > >>> > >>> pair benchmark( > >>> void new_algo(RandomIterator, RandomIterator), > >>> void cur_algo(RandomIterator, RandomIterator)) { > >>> vector toSort1 =3D elements; > >>> Counter counter1 =3D sort(new_algo, toSort1.begin(), toSort1.en= d()); > >>> > >>> vector toSort2 =3D elements; > >>> Counter counter2 =3D sort(cur_algo, toSort2.begin(), toSort2.en= d()); > >>> > >>> if (toSort1 !=3D toSort2) { > >>> cout << "INCORRECT RESULT!" << endl; > >>> exit(1); > >>> } > >>> > >>> return make_pair(counter1, counter2); > >>> } > >>> }; > >>> > >>> > >>> void relative_metric(string name, int m1, int m2) { > >>> if (m1 =3D=3D 0 && m2 =3D=3D 0) return; > >>> if (m2 =3D=3D 0) { > >>> cout << name << ": inf =3D " << m1 << " / " << m2 << endl; > >>> } else { > >>> cout << name << ": " << m1 / (double) m2 << " =3D " << m1 << " = / " > >>> << m2 << endl; > >>> } > >>> } > >>> > >>> > >>> void relative_report(Counter const& c1, Counter const& c2) { > >>> cout << endl; > >>> relative_metric("construct", c1.construct, c2.construct); > >>> relative_metric("copy", c1.copy, c2.copy); > >>> relative_metric("assign", c1.assign, c2.assign); > >>> relative_metric("move", c1.move, c2.move); > >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); > >>> relative_metric("compare", c1.compare, c2.compare); > >>> relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDer= ef); > >>> relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy)= ; > >>> relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorE= quals); > >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); > >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); > >>> relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff)= ; > >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); > >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); > >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, > >>> c2.iteratorAddAssign); > >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, > >>> c2.iteratorArithmetic); > >>> relative_metric("micros", c1.micros, c2.micros); > >>> cout << endl; > >>> } > >>> > >>> > >>> /// This is a helper function for the merge routines. > >>> template > >>> void _my_merge_without_buffer( > >>> _BidirectionalIterator __first, > >>> _BidirectionalIterator __middle, > >>> _BidirectionalIterator __last, > >>> _Distance __len1, _Distance __len2, > >>> _Compare __comp) { > >>> > >>> if (__len1 =3D=3D 0 || __len2 =3D=3D 0) > >>> return; > >>> > >>> if (__len1 + __len2 =3D=3D 2) { > >>> if (__comp(__middle, __first)) > >>> std::iter_swap(__first, __middle); > >>> return; > >>> } > >>> > >>> _BidirectionalIterator __first_cut =3D __first; > >>> _BidirectionalIterator __second_cut =3D __middle; > >>> _Distance __len11 =3D 0; > >>> _Distance __len22 =3D 0; > >>> > >>> > >>> #define NEW_ALGO 1 > >>> > >>> #ifdef NEW_ALGO > >>> // Prevent stack overflow > >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { > >>> #endif > >>> if (__len1 > __len2) > >>> { > >>> __len11 =3D __len1 / 2; > >>> std::advance(__first_cut, __len11); > >>> __second_cut > >>> =3D std::__lower_bound(__middle, __last, *__first_cut, > >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); > >>> __len22 =3D std::distance(__middle, __second_cut); > >>> } > >>> else > >>> { > >>> __len22 =3D __len2 / 2; > >>> std::advance(__second_cut, __len22); > >>> __first_cut > >>> =3D std::__upper_bound(__first, __middle, *__second_cut, > >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); > >>> __len11 =3D std::distance(__first, __first_cut); > >>> } > >>> #ifdef NEW_ALGO > >>> } else > >>> { > >>> _Distance __cut_range; > >>> // __cut_len is always equal to __middle - __first_cut (and eve= ntually to > >>> // __second_cut - __middle) but we keep track of it to avoid ca= lculating it > >>> // later which can be expensive for bidirectional iterators. > >>> _Distance __cut_len; > >>> > >>> if (__len1 > __len2) > >>> { > >>> std::advance(__first_cut, __len1 - __len2); > >>> __cut_range =3D __len2; > >>> } > >>> else > >>> { > >>> __cut_range =3D __len1; > >>> } > >>> __cut_len =3D __cut_range; > >>> > >>> while (__cut_range > 0) { > >>> _Distance __half_cut_range =3D __cut_range / 2; > >>> _BidirectionalIterator __mid_first_cut =3D __first_cut; > >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); > >>> _BidirectionalIterator __mid_second_cut =3D __second_cut; > >>> std::advance(__mid_second_cut, __half_cut_range); > >>> > >>> if (__comp(__mid_second_cut, __mid_first_cut)) { > >>> __cut_range -=3D __half_cut_range + 1; > >>> __second_cut =3D __mid_second_cut; > >>> ++__second_cut; > >>> } else { > >>> __cut_len -=3D __cut_range - __half_cut_range; > >>> __cut_range =3D __half_cut_range; > >>> __first_cut =3D __mid_first_cut; > >>> ++__first_cut; > >>> } > >>> } > >>> __len22 =3D __cut_len; > >>> __len11 =3D __len1 - __cut_len; > >>> } > >>> #endif > >>> > >>> _BidirectionalIterator __new_middle > >>> =3D std::rotate(__first_cut, __middle, __second_cut); > >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, > >>> __len11, __len22, __comp); > >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, > >>> __len1 - __len11, __len2 - __len22, __comp); > >>> } > >>> > >>> /// This is a helper function for the stable sorting routines. > >>> template > >>> void > >>> _my_inplace_stable_sort(_RandomAccessIterator __first, > >>> _RandomAccessIterator __last, _Compare __comp) > >>> { > >>> > >>> if (__last - __first < 15) > >>> { > >>> std::__insertion_sort(__first, __last, __comp); > >>> return; > >>> } > >>> _RandomAccessIterator __middle =3D __first + (__last - __first)= / 2; > >>> _my_inplace_stable_sort(__first, __middle, __comp); > >>> _my_inplace_stable_sort(__middle, __last, __comp); > >>> _my_merge_without_buffer(__first, __middle, __last, > >>> __middle - __first, > >>> __last - __middle, > >>> __comp); > >>> } > >>> > >>> > >>> template > >>> void my_inplace_stable_sort(Iterator start, Iterator end) { > >>> _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less= _iter()); > >>> } > >>> > >>> template > >>> void std_inplace_stable_sort(Iterator start, Iterator end) { > >>> std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_l= ess_iter()); > >>> } > >>> > >>> void benchmark(string const& dist_name, > >>> vector dist_func(int), > >>> int n) { > >>> vector v =3D dist_func(n); > >>> cout << "Distribution: " << dist_name << "(" << n << ")" << endl; > >>> cout << "Example: " << dist_func(20) << endl; > >>> TestCase testCase(v); > >>> > >>> pair counter_pair =3D > >>> testCase.benchmark( > >>> my_inplace_stable_sort > >>> , std_inplace_stable_sort > >>> ); > >>> Counter cKloet =3D counter_pair.first; > >>> Counter cInplace =3D counter_pair.second; > >>> > >>> relative_report(cKloet, cInplace); > >>> } > >>> > >>> > >>> vector random_unique(int n) { > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i; > >>> } > >>> for (int i =3D n - 1; i >=3D 0; --i) { > >>> swap(v[i], v[rand() % (i + 1)]); > >>> } > >>> return v; > >>> } > >>> > >>> vector random_dupes(int n) { > >>> vector v(n); > >>> int r =3D (int)sqrt(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i / r; > >>> } > >>> for (int i =3D n - 1; i >=3D 0; --i) { > >>> swap(v[i], v[rand() % (i + 1)]); > >>> } > >>> return v; > >>> } > >>> > >>> vector increasing_unique(int n) { > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector increasing_dupes(int n) { > >>> vector v(n); > >>> int r =3D (int)sqrt(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i / r; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector mostly_increasing(int n) { > >>> vector v(n); > >>> int r =3D (int)sqrt(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i + rand() % r; > >>> } > >>> return v; > >>> } > >>> > >>> > >>> vector decreasing_unique(int n) { > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D n - 1 - i; > >>> } > >>> return v; > >>> } > >>> > >>> vector decreasing_dupes(int n) { > >>> vector v(n); > >>> int r =3D (int)sqrt(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D (n - 1 - i) / r; > >>> } > >>> return v; > >>> } > >>> > >>> vector mostly_decreasing(int n) { > >>> vector v(n); > >>> int r =3D (int)sqrt(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D n - 1 - i + rand() % r; > >>> } > >>> return v; > >>> } > >>> > >>> vector sawtooth_unique(int n) { > >>> int r =3D (int)sqrt(n); > >>> while (gcd(r, n) !=3D 1) ++r; > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[(i * r) % n] =3D i; > >>> } > >>> return v; > >>> } > >>> > >>> vector sawtooth_dupes(int n) { > >>> int r =3D (int)sqrt(n); > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i % r; > >>> } > >>> return v; > >>> } > >>> > >>> vector increasing_with_swap(int n) { > >>> vector v(n); > >>> for (int i =3D 0; i < n; ++i) { > >>> v[i] =3D i; > >>> } > >>> swap(v[0], v[n - 1]); > >>> return v; > >>> } > >>> > >>> vector increasing_fast_slow(int n) { > >>> int r =3D (int)sqrt(n); > >>> vector v(n); > >>> for (int i =3D 0; i < r; ++i) { > >>> v[i] =3D i * r; > >>> } > >>> for (int i =3D r; i < n; ++i) { > >>> v[i] =3D (i - r) * n / (n - r); > >>> } > >>> return v; > >>> } > >>> > >>> > >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) > >>> > >>> int main() { > >>> int n =3D 100000; > >>> BENCHMARK(random_unique, n); > >>> BENCHMARK(random_dupes, n); > >>> BENCHMARK(increasing_unique, n); > >>> BENCHMARK(increasing_dupes, n); > >>> BENCHMARK(mostly_increasing, n); > >>> BENCHMARK(decreasing_unique, n); > >>> BENCHMARK(decreasing_dupes, n); > >>> BENCHMARK(mostly_decreasing, n); > >>> BENCHMARK(sawtooth_unique, n); > >>> BENCHMARK(sawtooth_dupes, n); > >>> BENCHMARK(increasing_with_swap, n); > >>> BENCHMARK(increasing_fast_slow, n); > >>> > >>> return 0; > >>> } > >> >