From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by sourceware.org (Postfix) with ESMTPS id 9FE3B3858403 for ; Thu, 25 Nov 2021 05:35:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9FE3B3858403 Received: by mail-wm1-x32f.google.com with SMTP id az34-20020a05600c602200b0033bf8662572so3855073wmb.0 for ; Wed, 24 Nov 2021 21:35:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=V6kozivXKBjQw5YqBWLMLNYnG6vC25H/AY5kBzJYlHU=; b=Soc9VTG88282wweed2R75USNS1oSYNcH0Kvu6abLcfqJOF60dp9jy5wyHz0FRhjbmN FHiYL9V/hqO7dZcVnBMIoH0LcXOqM/ggrJYxg2gzas4cccn9Spg65tOdemp2THIUA660 tVvOdx/zBSCsXJt1SAw9MmIZSWmflQCbTpos+dl1NLHH0VFjlxUqMKZbzOO+itmnK0XK GYjqT7oajIgZWGUHmJ3D9cLWW4faHM8byiJePmG+Wjf5UnpUs98wVpWX3gymRmmwh6Bc wMPeaXdbLft56MkTRXq98kpBVYqsh3SeHbPz8+MJsXlZ34YN5ivzJSKrMENVuWSZ2Oup hhqw== X-Gm-Message-State: AOAM532N+A0mWtAPzfR6iAHAv6skGlpvUOTbWCtt7+ohSWOGABkhWSZN 0l9VbQWzXnFkdyOzbRVwud2qyRCO72I= X-Google-Smtp-Source: ABdhPJzjVh2D+zRYWygOeZb//SbKvjwIM3ChoAkhzNUiwQIvc1Wnq0jXEK9XuvVKHKcNUk7oKodwNw== X-Received: by 2002:a05:600c:2052:: with SMTP id p18mr4264716wmg.3.1637818533679; Wed, 24 Nov 2021 21:35:33 -0800 (PST) Received: from [10.40.4.97] ([109.190.253.11]) by smtp.googlemail.com with ESMTPSA id a10sm7223277wmq.27.2021.11.24.21.35.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 24 Nov 2021 21:35:33 -0800 (PST) Subject: Re: __inplace_stable_sort improvement To: David de Kloet Cc: libstdc++@gcc.gnu.org References: <9f7342af-398c-fa0f-4182-761644d6e5f3@gmail.com> From: =?UTF-8?Q?Fran=c3=a7ois_Dumont?= Message-ID: <54dbf38f-6409-9971-bdec-457ac6d2473a@gmail.com> Date: Thu, 25 Nov 2021 06:35:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: fr X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL_CSS, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 05:35:40 -0000 On 24/11/21 7:48 pm, David de Kloet wrote: > On Wed, Nov 24, 2021 at 7:33 PM François Dumont wrote: >> >> On 18/11/21 1:58 pm, David de Kloet via Libstdc++ wrote: >>> Hello libstdc++, >>> >>> Apologies if this is the wrong list for my message. I believe I have an >>> improvement for __inplace_stable_sort and I'm wondering if this is worth >>> pursuing and if so how to proceed. >> Of course, anyone interested in enhancing the library is welcome. >> >> The best way to do so would be to provide a patch on the current library >> code. I'm the last one to have touch to this algo and it would be easier >> for me to understand if you try to implement it directly on the library. > _my_merge_without_buffer below preprocesses exactly to > __merge_without_buffer from the library if you undefine NEW_ALGO so > you can see exactly what my changes are. > > I currently don't have much time, but when I do I'd be happy to > provide it as a patch. If you find time before me, you're also more > than welcome to use the code. > >> The good news is that you are trying to enhance a piece of code rarely >> used as it is only when the buffer allocation fails. So it is less >> critical and so perhaps more open to modifications. >> >> During my last work on it I've added a performance test: >> performance/25_algorithms/inplace_merge.cc. It is also supposed to check >> performance when we fail to get a buffer and so use >> __merge_without_buffer. Use 'make check-performance' to run it, see >> library manual guide. Do not hesitate to change it if you think that >> your benches are better. > I don't currently have the environment to `make check-performance`. I > just copied the code from the web and adjusted it. But (again) when I > have time, I'll look into that. > > If you know of any "contributing to stdlibc++" guides, that would be great too. https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html > >>> __inplace_stable_sort does a merge sort where in each recursive step the 2 >>> sublists are merged in-place. The merge step is itself also recursive. In order >>> to merge [A, B) and [B, C), a tail section S1 of [A, B) and a head section S2 >>> of [B, C) are found such that all elements in S1 are greater than all elements >>> in S2 and then those sections are swapped/rotated and the shorter sections are >>> merged recursively. This is done by the function __merge_without_buffer. >>> >>> In the current implementation, __merge_without_buffer takes the half point of >>> the longer section and does a binary search (via lower_bound or upper_bound) to >>> find the corresponding cut point on the other section. Then it rotates the >>> sections. >>> >>> My proposal is to find the cut points such that both rotated sections have the >>> same length such that they can always simply be swapped. >> What do you mean by swapped ? > I mean the first element of section 1 gets swapped with the first > element of section 2. The second element of section 1 gets swapped > with the second element of section 2. Etc. for each index. > >> I see in the code below that you are just >> changing how the cut positions are computed and then call the same >> std::rotate algo. I cannot see any reason why changing the position >> would make the rotate faster. > The implementation of rotate does a series of such section swaps. But > if both section have the same length, it's only a single swap of > sections which makes rotate about 1.5-2 times faster. Ok, I see now, you try to use as much as possible the std::swap_ranges in __rotate for random access iterator. Maybe it means that this optimization should be limited to this type of iterator. We'll see depending on benches. I'll have a try then when I have time. Thanks > > The purpose of the code I added is to choose the cut positions such > that the rotated sections are of equal length, which makes rotate > faster by doing fewer element swaps, resulting in fewer calls to the > copy/assignment constructor. > >>> I have implemented >>> this and it seems to perform better in general and in the special cases that >>> I've tried. >>> >>> Unfortunately, since we are no longer guaranteed to halve the longer section in >>> each step, there is a possibility for a stack overflow. To avoid this, the new >>> algorithm falls back to the old behavior whenever the sections have >>> significantly different lengths. >>> >>> Also, where the current algorithm performs the binary search implicitly via >>> lower_bound/upper_bound, the new algorithm has to implement the binary search >>> explicitly because it has to look at both sections to determine the next step >>> in the search. >>> >>> I have created wrappers for my value objects as well as my iterators to count >>> the number of operations performed by the current and my algorithm and run >>> benchmarks on various distributions. My code is at the bottom but below is the >>> output as run on my machine. For each operation it outputs the ratio between >>> the number of operations performed by my algorithm over the number performed by >>> the current algorithm. >>> >>> Thanks for your attention, >>> David >>> >>> >>> Distribution: random_unique(100000) >>> Example: [19, 13, 5, 0, 10, 6, 15, 11, 16, 9, 14, 18, 4, 17, 1, 7, 8, 12, 3, 2] >>> >>> move: 0.588026 = 3331848 / 5666154 >>> moveAssign: 0.594789 = 6852819 / 11521431 >>> compare: 0.914997 = 2028977 / 2217469 >>> iteratorDeref: 0.718466 = 11001863 / 15312994 >>> iteratorCopy: 0.791332 = 33977189 / 42936712 >>> iteratorEquals: 2.27841 = 5174606 / 2271143 >>> iteratorInc: 1.2143 = 7697971 / 6339449 >>> iteratorDec: 0.0833529 = 473556 / 5681340 >>> iteratorDiff: 0.444865 = 1362598 / 3062950 >>> iteratorSub: 0.0200683 = 6272 / 312533 >>> iteratorAdd: 0.0695467 = 57149 / 821736 >>> iteratorAddAssign: 1.36491 = 3141949 / 2301950 >>> iteratorArithmetic: 0.861623 = 17914101 / 20791101 >>> micros: 0.663704 = 650825 / 980596 >>> >>> Distribution: random_dupes(100000) >>> Example: [4, 4, 1, 3, 0, 3, 2, 2, 4, 4, 0, 3, 1, 0, 3, 2, 0, 2, 1, 1] >>> >>> move: 0.570041 = 2949623 / 5174401 >>> moveAssign: 0.577736 = 6087819 / 10537375 >>> compare: 0.845419 = 1530877 / 1810790 >>> iteratorDeref: 0.678233 = 9232869 / 13613114 >>> iteratorCopy: 0.742699 = 25652244 / 34539225 >>> iteratorEquals: 2.41643 = 4123881 / 1706601 >>> iteratorInc: 1.15617 = 6554374 / 5669028 >>> iteratorDec: 0.101004 = 539991 / 5346219 >>> iteratorDiff: 0.417093 = 959972 / 2301580 >>> iteratorSub: 0.0350046 = 8882 / 253738 >>> iteratorAdd: 0.0972571 = 64122 / 659304 >>> iteratorAddAssign: 1.2279 = 2247197 / 1830107 >>> iteratorArithmetic: 0.81605 = 14498419 / 17766577 >>> micros: 0.699386 = 554426 / 792732 >>> >>> Distribution: increasing_unique(100000) >>> Example: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] >>> >>> move: 1 = 91808 / 91808 >>> moveAssign: 1 = 91808 / 91808 >>> compare: 0.573591 = 224558 / 391495 >>> iteratorDeref: 0.722385 = 540924 / 748803 >>> iteratorCopy: 0.298175 = 976735 / 3275708 >>> iteratorEquals: 0.522302 = 116383 / 222827 >>> iteratorInc: 0.715444 = 132750 / 185549 >>> iteratorDec: 1 = 91808 / 91808 >>> iteratorDiff: 0.217455 = 40956 / 188342 >>> iteratorAdd: 1 = 16383 / 16383 >>> iteratorAddAssign: 0.299523 = 81884 / 273381 >>> iteratorArithmetic: 0.49082 = 480164 / 978290 >>> micros: 0.479792 = 22935 / 47802 >>> >>> Distribution: increasing_dupes(100000) >>> Example: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] >>> >>> move: 1 = 91808 / 91808 >>> moveAssign: 1 = 91808 / 91808 >>> compare: 0.573591 = 224558 / 391495 >>> iteratorDeref: 0.722385 = 540924 / 748803 >>> iteratorCopy: 0.298175 = 976735 / 3275708 >>> iteratorEquals: 0.522302 = 116383 / 222827 >>> iteratorInc: 0.715444 = 132750 / 185549 >>> iteratorDec: 1 = 91808 / 91808 >>> iteratorDiff: 0.217455 = 40956 / 188342 >>> iteratorAdd: 1 = 16383 / 16383 >>> iteratorAddAssign: 0.299523 = 81884 / 273381 >>> iteratorArithmetic: 0.49082 = 480164 / 978290 >>> micros: 0.449154 = 20675 / 46031 >>> >>> Distribution: mostly_increasing(100000) >>> Example: [3, 2, 3, 4, 5, 5, 6, 7, 9, 9, 11, 13, 12, 14, 15, 17, 17, 18, 21, 20] >>> >>> move: 0.681093 = 814725 / 1196203 >>> moveAssign: 0.703421 = 1809563 / 2572519 >>> compare: 0.924016 = 965320 / 1044700 >>> iteratorDeref: 0.847765 = 3802634 / 4485480 >>> iteratorCopy: 0.804068 = 11859111 / 14748894 >>> iteratorEquals: 1.59826 = 1388752 / 868913 >>> iteratorInc: 1.25659 = 1818421 / 1447111 >>> iteratorDec: 0.383738 = 537108 / 1399674 >>> iteratorDiff: 0.486488 = 562273 / 1155779 >>> iteratorSub: 0.0638659 = 6666 / 104375 >>> iteratorAdd: 0.184194 = 55586 / 301780 >>> iteratorAddAssign: 1.30369 = 1105439 / 847932 >>> iteratorArithmetic: 0.893672 = 5474245 / 6125564 >>> micros: 0.790603 = 233953 / 295917 >>> >>> Distribution: decreasing_unique(100000) >>> Example: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] >>> >>> move: 0.40597 = 995312 / 2451688 >>> moveAssign: 0.45785 = 2459840 / 5372592 >>> compare: 0.489247 = 151631 / 309927 >>> iteratorDeref: 0.524798 = 3415934 / 6509043 >>> iteratorCopy: 0.563628 = 5971275 / 10594360 >>> iteratorEquals: 1.18395 = 843533 / 712475 >>> iteratorInc: 0.864454 = 1460463 / 1689462 >>> iteratorDec: 0.373125 = 1606976 / 4306800 >>> iteratorDiff: 0.495226 = 235930 / 476409 >>> iteratorSub: 0.248254 = 13824 / 55685 >>> iteratorAdd: 0.590877 = 135839 / 229894 >>> iteratorAddAssign: 0.421852 = 119646 / 283621 >>> iteratorArithmetic: 0.569514 = 4416211 / 7754346 >>> micros: 0.543145 = 167865 / 309061 >>> >>> Distribution: decreasing_dupes(100000) >>> Example: [4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0] >>> >>> move: 0.741695 = 1455276 / 1962096 >>> moveAssign: 0.736048 = 2826598 / 3840238 >>> compare: 0.641679 = 282851 / 440798 >>> iteratorDeref: 0.740565 = 3386062 / 4572271 >>> iteratorCopy: 0.628798 = 5008472 / 7965150 >>> iteratorEquals: 1.50206 = 794354 / 528843 >>> iteratorInc: 1.39602 = 2164546 / 1550510 >>> iteratorDec: 0.336639 = 837728 / 2488504 >>> iteratorDiff: 0.313457 = 88552 / 282501 >>> iteratorSub: 0.165388 = 3880 / 23460 >>> iteratorAdd: 0.377644 = 27139 / 71864 >>> iteratorAddAssign: 0.551364 = 182383 / 330785 >>> iteratorArithmetic: 0.776766 = 4098582 / 5276467 >>> micros: 0.71282 = 160398 / 225019 >>> >>> Distribution: mostly_decreasing(100000) >>> Example: [20, 19, 20, 18, 18, 15, 13, 13, 14, 13, 10, 9, 10, 8, 5, 7, >>> 6, 4, 3, 2] >>> >>> move: 0.760602 = 2197435 / 2889075 >>> moveAssign: 0.768473 = 4591313 / 5974593 >>> compare: 0.932913 = 977678 / 1047984 >>> iteratorDeref: 0.838641 = 6633169 / 7909424 >>> iteratorCopy: 0.829028 = 15447960 / 18633830 >>> iteratorEquals: 1.7009 = 2000686 / 1176251 >>> iteratorInc: 1.52771 = 3788203 / 2479657 >>> iteratorDec: 0.363843 = 1369990 / 3765332 >>> iteratorDiff: 0.489186 = 595472 / 1217270 >>> iteratorSub: 0.0877827 = 10925 / 124455 >>> iteratorAdd: 0.201991 = 69640 / 344767 >>> iteratorAddAssign: 1.32756 = 1133512 / 853830 >>> iteratorArithmetic: 0.900303 = 8968428 / 9961562 >>> micros: 0.846298 = 377504 / 446065 >>> >>> Distribution: sawtooth_unique(100000) >>> Example: [0, 3, 6, 9, 12, 15, 18, 1, 4, 7, 10, 13, 16, 19, 2, 5, 8, 11, 14, 17] >>> >>> move: 0.569953 = 2912465 / 5110007 >>> moveAssign: 0.577701 = 6012427 / 10407511 >>> compare: 0.824488 = 1604387 / 1945920 >>> iteratorDeref: 0.678979 = 9299794 / 13696724 >>> iteratorCopy: 0.72546 = 26157352 / 36056250 >>> iteratorEquals: 2.00683 = 4131766 / 2058856 >>> iteratorInc: 1.10728 = 6498257 / 5868688 >>> iteratorDec: 0.111225 = 552519 / 4967557 >>> iteratorDiff: 0.405328 = 988144 / 2437889 >>> iteratorSub: 0.026249 = 6327 / 241038 >>> iteratorAdd: 0.089034 = 55918 / 628052 >>> iteratorAddAssign: 1.18737 = 2297233 / 1934717 >>> iteratorArithmetic: 0.801143 = 14530164 / 18136797 >>> micros: 0.706282 = 583937 / 826776 >>> >>> Distribution: sawtooth_dupes(100000) >>> Example: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] >>> >>> move: 0.547721 = 2348873 / 4288447 >>> moveAssign: 0.543252 = 4613817 / 8492965 >>> compare: 0.664642 = 841503 / 1266099 >>> iteratorDeref: 0.599656 = 6292282 / 10493159 >>> iteratorCopy: 0.60663 = 15336816 / 25281976 >>> iteratorEquals: 2.24803 = 2859760 / 1272118 >>> iteratorInc: 1.07142 = 4999546 / 4666276 >>> iteratorDec: 0.0490551 = 205953 / 4198405 >>> iteratorDiff: 0.314518 = 473157 / 1504390 >>> iteratorSub: 0.0371951 = 5998 / 161258 >>> iteratorAdd: 0.0875684 = 34672 / 395942 >>> iteratorAddAssign: 0.900009 = 1233077 / 1370072 >>> iteratorArithmetic: 0.72316 = 9812163 / 13568461 >>> micros: 0.610891 = 353102 / 578011 >>> >>> Distribution: increasing_with_swap(100000) >>> Example: [19, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 0] >>> >>> move: 1 = 291782 / 291782 >>> moveAssign: 1 = 491779 / 491779 >>> compare: 0.57449 = 225064 / 391763 >>> iteratorDeref: 0.819553 = 941922 / 1149312 >>> iteratorCopy: 0.377964 = 1392386 / 3683915 >>> iteratorEquals: 0.525758 = 117408 / 223312 >>> iteratorInc: 0.863708 = 333050 / 385605 >>> iteratorDec: 0.999753 = 291716 / 291788 >>> iteratorDiff: 0.224408 = 42528 / 189512 >>> iteratorSub: 0.888889 = 104 / 117 >>> iteratorAdd: 0.997151 = 16801 / 16849 >>> iteratorAddAssign: 0.302604 = 82864 / 273836 >>> iteratorArithmetic: 0.640448 = 884471 / 1381019 >>> micros: 0.616124 = 36233 / 58808 >>> >>> Distribution: increasing_fast_slow(100000) >>> Example: [0, 4, 8, 12, 0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18] >>> >>> move: 0.640348 = 825772 / 1289568 >>> moveAssign: 0.627081 = 1559792 / 2487384 >>> compare: 0.714235 = 431286 / 603843 >>> iteratorDeref: 0.694913 = 2421875 / 3485151 >>> iteratorCopy: 0.637176 = 5645877 / 8860786 >>> iteratorEquals: 1.7552 = 953880 / 543458 >>> iteratorInc: 1.08941 = 1676609 / 1539005 >>> iteratorDec: 0.135904 = 154533 / 1137073 >>> iteratorDiff: 0.388132 = 192787 / 496705 >>> iteratorSub: 0.00577964 = 141 / 24396 >>> iteratorAdd: 0.207287 = 17272 / 83324 >>> iteratorAddAssign: 0.849015 = 431320 / 508024 >>> iteratorArithmetic: 0.790987 = 3426542 / 4331985 >>> micros: 0.777699 = 153552 / 197444 >>> >>> >>> >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> >>> using namespace std; >>> >>> #define LINE0 0 >>> #define cout_line (cout << "line " << __LINE__ - LINE0 << ": ") >>> #define vout(v) #v << " = " << (v) >>> #define print1(v) cout_line << vout(v) << endl >>> #define print2(v1, v2) cout_line << vout(v1) << ", " << vout(v2) << endl >>> #define print3(v1, v2, v3) cout_line << vout(v1) << ", " << vout(v2) >>> << ", " << vout(v3) << endl >>> #define GET_MACRO(_1,_2,_3,NAME,...) NAME >>> #define print(...) GET_MACRO(__VA_ARGS__, print3, print2, print1)(__VA_ARGS__) >>> >>> template >>> basic_ostream & operator << (basic_ostream & os, >>> Container const& x); >>> >>> template >>> ostream & operator << (ostream & os, pair const& p) { >>> return os << p.first << ":" << p.second; >>> } >>> >>> template >>> basic_ostream & operator << (basic_ostream & os, >>> Container const& x) { >>> os << "["; >>> bool first = true; >>> for(auto const& y : x) { >>> if (!first) os << ", "; >>> os << y; >>> first = false; >>> } >>> return os << "]"; >>> } >>> >>> >>> struct Counter { >>> int construct = 0; >>> int copy = 0; >>> int assign = 0; >>> int move = 0; >>> int moveAssign = 0; >>> int compare = 0; >>> int iteratorDeref = 0; >>> int iteratorCopy = 0; >>> int iteratorEquals = 0; >>> int iteratorInc = 0; >>> int iteratorDec = 0; >>> int iteratorDiff = 0; >>> int iteratorSub = 0; >>> int iteratorAdd = 0; >>> int iteratorAddAssign = 0; >>> int iteratorArithmetic = 0; >>> int micros = 0; >>> }; >>> >>> int next_element_id = 0; >>> >>> class Element { >>> Counter* counter; >>> public: >>> int val; >>> int id; // Distinguis identical values to verify stability. >>> Element(int val, Counter* counter) >>> : val(val), counter(counter), id(++next_element_id) { >>> ++counter->construct; >>> } >>> >>> Element(Element const& other) >>> : val(other.val), counter(other.counter), id(other.id) { >>> ++counter->copy; >>> } >>> >>> Element& operator=(Element const& other) { >>> val = other.val; >>> counter = other.counter; >>> id = other.id; >>> ++counter->assign; >>> return *this; >>> } >>> >>> Element(Element&& other) noexcept >>> : val(move(other.val)), counter(other.counter), id(other.id) { >>> ++counter->move; >>> } >>> >>> Element& operator=(Element&& other) noexcept { >>> val = move(other.val); >>> counter = other.counter; >>> id = other.id; >>> ++counter->moveAssign; >>> return *this; >>> } >>> >>> bool operator<(Element const& other) const { >>> ++counter->compare; >>> return val < other.val; >>> } >>> >>> bool operator==(Element const& other) const { >>> return val == other.val && id == other.id; >>> } >>> }; >>> >>> ostream & operator << (ostream & os, Element const& e) { >>> return os << e.val << "(" << e.id << ")"; >>> } >>> >>> >>> class RandomIterator { >>> typedef vector::iterator It; >>> It it; >>> Counter* counter; >>> >>> public: >>> using iterator_category = std::random_access_iterator_tag; >>> using difference_type = std::ptrdiff_t; >>> using value_type = Element; >>> using pointer = value_type*; >>> using reference = value_type&; >>> >>> RandomIterator(It it, Counter* counter) : it(it), counter(counter) { >>> } >>> >>> RandomIterator(RandomIterator const& other) : it(other.it), >>> counter(other.counter) { >>> counter->iteratorCopy += 1; >>> } >>> >>> bool operator== (RandomIterator other) { >>> counter->iteratorEquals += 1; >>> counter->iteratorArithmetic += 1; >>> return it == other.it; >>> } >>> >>> RandomIterator operator++ () { >>> counter->iteratorInc += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(++it, counter); >>> } >>> >>> RandomIterator operator-- () { >>> counter->iteratorDec += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(--it, counter); >>> } >>> >>> Element& operator* () { >>> counter->iteratorDeref += 1; >>> return *it; >>> } >>> >>> difference_type operator- (RandomIterator other) { >>> counter->iteratorDiff += 1; >>> counter->iteratorArithmetic += 1; >>> return it - other.it; >>> } >>> >>> RandomIterator operator- (difference_type diff) { >>> counter->iteratorSub += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it - diff, counter); >>> } >>> >>> RandomIterator operator+ (difference_type diff) { >>> counter->iteratorAdd += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it + diff, counter); >>> } >>> >>> RandomIterator operator+= (difference_type diff) { >>> counter->iteratorAddAssign += 1; >>> counter->iteratorArithmetic += 1; >>> return RandomIterator(it += diff, counter); >>> } >>> }; >>> >>> >>> class TestCase { >>> Counter counter; >>> vector elements; >>> >>> public: >>> TestCase(vector const& data) { >>> elements.reserve(data.size()); >>> for (int i: data) { >>> elements.push_back(Element(i, &counter)); >>> } >>> } >>> >>> Counter sort( >>> void algo(RandomIterator, RandomIterator), >>> vector::iterator begin, >>> vector::iterator end) { >>> counter = Counter(); >>> auto t0 = std::chrono::system_clock::now(); >>> >>> algo(RandomIterator(begin, &counter), RandomIterator(end, &counter)); >>> >>> auto t1 = std::chrono::system_clock::now(); >>> counter.micros = >>> std::chrono::duration_cast(t1 - >>> t0).count(); >>> return counter; >>> } >>> >>> pair benchmark( >>> void new_algo(RandomIterator, RandomIterator), >>> void cur_algo(RandomIterator, RandomIterator)) { >>> vector toSort1 = elements; >>> Counter counter1 = sort(new_algo, toSort1.begin(), toSort1.end()); >>> >>> vector toSort2 = elements; >>> Counter counter2 = sort(cur_algo, toSort2.begin(), toSort2.end()); >>> >>> if (toSort1 != toSort2) { >>> cout << "INCORRECT RESULT!" << endl; >>> exit(1); >>> } >>> >>> return make_pair(counter1, counter2); >>> } >>> }; >>> >>> >>> void relative_metric(string name, int m1, int m2) { >>> if (m1 == 0 && m2 == 0) return; >>> if (m2 == 0) { >>> cout << name << ": inf = " << m1 << " / " << m2 << endl; >>> } else { >>> cout << name << ": " << m1 / (double) m2 << " = " << m1 << " / " >>> << m2 << endl; >>> } >>> } >>> >>> >>> void relative_report(Counter const& c1, Counter const& c2) { >>> cout << endl; >>> relative_metric("construct", c1.construct, c2.construct); >>> relative_metric("copy", c1.copy, c2.copy); >>> relative_metric("assign", c1.assign, c2.assign); >>> relative_metric("move", c1.move, c2.move); >>> relative_metric("moveAssign", c1.moveAssign, c2.moveAssign); >>> relative_metric("compare", c1.compare, c2.compare); >>> relative_metric("iteratorDeref", c1.iteratorDeref, c2.iteratorDeref); >>> relative_metric("iteratorCopy", c1.iteratorCopy, c2.iteratorCopy); >>> relative_metric("iteratorEquals", c1.iteratorEquals, c2.iteratorEquals); >>> relative_metric("iteratorInc", c1.iteratorInc, c2.iteratorInc); >>> relative_metric("iteratorDec", c1.iteratorDec, c2.iteratorDec); >>> relative_metric("iteratorDiff", c1.iteratorDiff, c2.iteratorDiff); >>> relative_metric("iteratorSub", c1.iteratorSub, c2.iteratorSub); >>> relative_metric("iteratorAdd", c1.iteratorAdd, c2.iteratorAdd); >>> relative_metric("iteratorAddAssign", c1.iteratorAddAssign, >>> c2.iteratorAddAssign); >>> relative_metric("iteratorArithmetic", c1.iteratorArithmetic, >>> c2.iteratorArithmetic); >>> relative_metric("micros", c1.micros, c2.micros); >>> cout << endl; >>> } >>> >>> >>> /// This is a helper function for the merge routines. >>> template >>> void _my_merge_without_buffer( >>> _BidirectionalIterator __first, >>> _BidirectionalIterator __middle, >>> _BidirectionalIterator __last, >>> _Distance __len1, _Distance __len2, >>> _Compare __comp) { >>> >>> if (__len1 == 0 || __len2 == 0) >>> return; >>> >>> if (__len1 + __len2 == 2) { >>> if (__comp(__middle, __first)) >>> std::iter_swap(__first, __middle); >>> return; >>> } >>> >>> _BidirectionalIterator __first_cut = __first; >>> _BidirectionalIterator __second_cut = __middle; >>> _Distance __len11 = 0; >>> _Distance __len22 = 0; >>> >>> >>> #define NEW_ALGO 1 >>> >>> #ifdef NEW_ALGO >>> // Prevent stack overflow >>> if (__len1 > 4 * __len2 || __len2 > 4 * __len1) { >>> #endif >>> if (__len1 > __len2) >>> { >>> __len11 = __len1 / 2; >>> std::advance(__first_cut, __len11); >>> __second_cut >>> = std::__lower_bound(__middle, __last, *__first_cut, >>> __gnu_cxx::__ops::__iter_comp_val(__comp)); >>> __len22 = std::distance(__middle, __second_cut); >>> } >>> else >>> { >>> __len22 = __len2 / 2; >>> std::advance(__second_cut, __len22); >>> __first_cut >>> = std::__upper_bound(__first, __middle, *__second_cut, >>> __gnu_cxx::__ops::__val_comp_iter(__comp)); >>> __len11 = std::distance(__first, __first_cut); >>> } >>> #ifdef NEW_ALGO >>> } else >>> { >>> _Distance __cut_range; >>> // __cut_len is always equal to __middle - __first_cut (and eventually to >>> // __second_cut - __middle) but we keep track of it to avoid calculating it >>> // later which can be expensive for bidirectional iterators. >>> _Distance __cut_len; >>> >>> if (__len1 > __len2) >>> { >>> std::advance(__first_cut, __len1 - __len2); >>> __cut_range = __len2; >>> } >>> else >>> { >>> __cut_range = __len1; >>> } >>> __cut_len = __cut_range; >>> >>> while (__cut_range > 0) { >>> _Distance __half_cut_range = __cut_range / 2; >>> _BidirectionalIterator __mid_first_cut = __first_cut; >>> std::advance(__mid_first_cut, __cut_range - __half_cut_range - 1); >>> _BidirectionalIterator __mid_second_cut = __second_cut; >>> std::advance(__mid_second_cut, __half_cut_range); >>> >>> if (__comp(__mid_second_cut, __mid_first_cut)) { >>> __cut_range -= __half_cut_range + 1; >>> __second_cut = __mid_second_cut; >>> ++__second_cut; >>> } else { >>> __cut_len -= __cut_range - __half_cut_range; >>> __cut_range = __half_cut_range; >>> __first_cut = __mid_first_cut; >>> ++__first_cut; >>> } >>> } >>> __len22 = __cut_len; >>> __len11 = __len1 - __cut_len; >>> } >>> #endif >>> >>> _BidirectionalIterator __new_middle >>> = std::rotate(__first_cut, __middle, __second_cut); >>> _my_merge_without_buffer(__first, __first_cut, __new_middle, >>> __len11, __len22, __comp); >>> _my_merge_without_buffer(__new_middle, __second_cut, __last, >>> __len1 - __len11, __len2 - __len22, __comp); >>> } >>> >>> /// This is a helper function for the stable sorting routines. >>> template >>> void >>> _my_inplace_stable_sort(_RandomAccessIterator __first, >>> _RandomAccessIterator __last, _Compare __comp) >>> { >>> >>> if (__last - __first < 15) >>> { >>> std::__insertion_sort(__first, __last, __comp); >>> return; >>> } >>> _RandomAccessIterator __middle = __first + (__last - __first) / 2; >>> _my_inplace_stable_sort(__first, __middle, __comp); >>> _my_inplace_stable_sort(__middle, __last, __comp); >>> _my_merge_without_buffer(__first, __middle, __last, >>> __middle - __first, >>> __last - __middle, >>> __comp); >>> } >>> >>> >>> template >>> void my_inplace_stable_sort(Iterator start, Iterator end) { >>> _my_inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); >>> } >>> >>> template >>> void std_inplace_stable_sort(Iterator start, Iterator end) { >>> std::__inplace_stable_sort(start, end, __gnu_cxx::__ops::__iter_less_iter()); >>> } >>> >>> void benchmark(string const& dist_name, >>> vector dist_func(int), >>> int n) { >>> vector v = dist_func(n); >>> cout << "Distribution: " << dist_name << "(" << n << ")" << endl; >>> cout << "Example: " << dist_func(20) << endl; >>> TestCase testCase(v); >>> >>> pair counter_pair = >>> testCase.benchmark( >>> my_inplace_stable_sort >>> , std_inplace_stable_sort >>> ); >>> Counter cKloet = counter_pair.first; >>> Counter cInplace = counter_pair.second; >>> >>> relative_report(cKloet, cInplace); >>> } >>> >>> >>> vector random_unique(int n) { >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> for (int i = n - 1; i >= 0; --i) { >>> swap(v[i], v[rand() % (i + 1)]); >>> } >>> return v; >>> } >>> >>> vector random_dupes(int n) { >>> vector v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i / r; >>> } >>> for (int i = n - 1; i >= 0; --i) { >>> swap(v[i], v[rand() % (i + 1)]); >>> } >>> return v; >>> } >>> >>> vector increasing_unique(int n) { >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> return v; >>> } >>> >>> >>> vector increasing_dupes(int n) { >>> vector v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i / r; >>> } >>> return v; >>> } >>> >>> >>> vector mostly_increasing(int n) { >>> vector v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i + rand() % r; >>> } >>> return v; >>> } >>> >>> >>> vector decreasing_unique(int n) { >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = n - 1 - i; >>> } >>> return v; >>> } >>> >>> vector decreasing_dupes(int n) { >>> vector v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = (n - 1 - i) / r; >>> } >>> return v; >>> } >>> >>> vector mostly_decreasing(int n) { >>> vector v(n); >>> int r = (int)sqrt(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = n - 1 - i + rand() % r; >>> } >>> return v; >>> } >>> >>> vector sawtooth_unique(int n) { >>> int r = (int)sqrt(n); >>> while (gcd(r, n) != 1) ++r; >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[(i * r) % n] = i; >>> } >>> return v; >>> } >>> >>> vector sawtooth_dupes(int n) { >>> int r = (int)sqrt(n); >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i % r; >>> } >>> return v; >>> } >>> >>> vector increasing_with_swap(int n) { >>> vector v(n); >>> for (int i = 0; i < n; ++i) { >>> v[i] = i; >>> } >>> swap(v[0], v[n - 1]); >>> return v; >>> } >>> >>> vector increasing_fast_slow(int n) { >>> int r = (int)sqrt(n); >>> vector v(n); >>> for (int i = 0; i < r; ++i) { >>> v[i] = i * r; >>> } >>> for (int i = r; i < n; ++i) { >>> v[i] = (i - r) * n / (n - r); >>> } >>> return v; >>> } >>> >>> >>> #define BENCHMARK(dist, n) benchmark(#dist, dist, n) >>> >>> int main() { >>> int n = 100000; >>> BENCHMARK(random_unique, n); >>> BENCHMARK(random_dupes, n); >>> BENCHMARK(increasing_unique, n); >>> BENCHMARK(increasing_dupes, n); >>> BENCHMARK(mostly_increasing, n); >>> BENCHMARK(decreasing_unique, n); >>> BENCHMARK(decreasing_dupes, n); >>> BENCHMARK(mostly_decreasing, n); >>> BENCHMARK(sawtooth_unique, n); >>> BENCHMARK(sawtooth_dupes, n); >>> BENCHMARK(increasing_with_swap, n); >>> BENCHMARK(increasing_fast_slow, n); >>> >>> return 0; >>> } >>