* [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible @ 2016-05-01 14:18 Eelis 2016-05-01 14:30 ` Eelis 0 siblings, 1 reply; 16+ messages in thread From: Eelis @ 2016-05-01 14:18 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 494 bytes --] Hi, The attached patch optimizes std::shuffle for the very common case where the generator range is large enough that a single invocation can produce two swap positions. This reduces the runtime of the following testcase by 37% on my machine: int main() { std::mt19937 gen; std::vector<int> v; v.reserve(10000); for (int i = 0; i != 10000; ++i) { v.push_back(i); std::shuffle(v.begin(), v.end(), gen); } std::cout << v.front() << '\n'; } Thoughts? Thanks, Eelis [-- Attachment #2: double-step-shuffle.patch --] [-- Type: text/x-patch, Size: 2440 bytes --] Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 235680) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3708,6 +3708,22 @@ #endif #ifdef _GLIBCXX_USE_C99_STDINT_TR1 + + template<typename _IntType, typename _UniformRandomNumberGenerator> + inline _IntType + __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) + { + const _IntType __urngrange = __g.max() - __g.min() + 1; + const _IntType __scaling = __urngrange / __bound; + const _IntType __past = __bound * __scaling; + + for (;;) + { + const _IntType __r = _IntType(__g()) - __g.min(); + if (__r < __past) return __r / __scaling; + } + } + /** * @brief Shuffle the elements of a sequence using a uniform random * number generator. @@ -3740,6 +3756,40 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + for (_RandomAccessIterator __i = __first + 1; __i != __last; ) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + + if (__i + 1 == __last) + { + const __uc_type __pos = __generate_random_index_below(__swap_range, __g); + std::iter_swap(__i, __first + __pos); + return; + } + + // Use a single generator invocation to produce swap positions for + // both of the next two elements: + + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-01 14:18 [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible Eelis @ 2016-05-01 14:30 ` Eelis 2016-05-03 12:39 ` Jonathan Wakely 0 siblings, 1 reply; 16+ messages in thread From: Eelis @ 2016-05-01 14:30 UTC (permalink / raw) To: gcc-patches; +Cc: libstdc++ Sorry, forgot to include the libstdc++ list. On 2016-05-01 16:18, Eelis wrote: > Hi, > > The attached patch optimizes std::shuffle for the very common case > where the generator range is large enough that a single invocation > can produce two swap positions. > > This reduces the runtime of the following testcase by 37% on my machine: > > int main() > { > std::mt19937 gen; > > std::vector<int> v; > v.reserve(10000); > for (int i = 0; i != 10000; ++i) > { > v.push_back(i); > std::shuffle(v.begin(), v.end(), gen); > } > > std::cout << v.front() << '\n'; > } > > Thoughts? > > Thanks, > > Eelis > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-01 14:30 ` Eelis @ 2016-05-03 12:39 ` Jonathan Wakely 2016-05-03 14:42 ` Eelis van der Weegen 0 siblings, 1 reply; 16+ messages in thread From: Jonathan Wakely @ 2016-05-03 12:39 UTC (permalink / raw) To: Eelis; +Cc: libstdc++, gcc-patches ENOPATCH On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: > Sorry, forgot to include the libstdc++ list. > > On 2016-05-01 16:18, Eelis wrote: >> >> Hi, >> >> The attached patch optimizes std::shuffle for the very common case >> where the generator range is large enough that a single invocation >> can produce two swap positions. >> >> This reduces the runtime of the following testcase by 37% on my machine: >> >> int main() >> { >> std::mt19937 gen; >> >> std::vector<int> v; >> v.reserve(10000); >> for (int i = 0; i != 10000; ++i) >> { >> v.push_back(i); >> std::shuffle(v.begin(), v.end(), gen); >> } >> >> std::cout << v.front() << '\n'; >> } >> >> Thoughts? >> >> Thanks, >> >> Eelis >> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-03 12:39 ` Jonathan Wakely @ 2016-05-03 14:42 ` Eelis van der Weegen 2016-05-25 19:54 ` Eelis 2016-08-31 12:45 ` Jonathan Wakely 0 siblings, 2 replies; 16+ messages in thread From: Eelis van der Weegen @ 2016-05-03 14:42 UTC (permalink / raw) Cc: libstdc++, gcc-patches [-- Attachment #1: Type: text/plain, Size: 985 bytes --] Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. On 2016-05-03 14:38, Jonathan Wakely wrote: > ENOPATCH > > On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: >> Sorry, forgot to include the libstdc++ list. >> >> On 2016-05-01 16:18, Eelis wrote: >>> >>> Hi, >>> >>> The attached patch optimizes std::shuffle for the very common case >>> where the generator range is large enough that a single invocation >>> can produce two swap positions. >>> >>> This reduces the runtime of the following testcase by 37% on my machine: >>> >>> int main() >>> { >>> std::mt19937 gen; >>> >>> std::vector<int> v; >>> v.reserve(10000); >>> for (int i = 0; i != 10000; ++i) >>> { >>> v.push_back(i); >>> std::shuffle(v.begin(), v.end(), gen); >>> } >>> >>> std::cout << v.front() << '\n'; >>> } >>> >>> Thoughts? >>> >>> Thanks, >>> >>> Eelis >>> >> >> [-- Attachment #2: double-step-shuffle.patch --] [-- Type: text/x-patch, Size: 2440 bytes --] Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 235680) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3708,6 +3708,22 @@ #endif #ifdef _GLIBCXX_USE_C99_STDINT_TR1 + + template<typename _IntType, typename _UniformRandomNumberGenerator> + inline _IntType + __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) + { + const _IntType __urngrange = __g.max() - __g.min() + 1; + const _IntType __scaling = __urngrange / __bound; + const _IntType __past = __bound * __scaling; + + for (;;) + { + const _IntType __r = _IntType(__g()) - __g.min(); + if (__r < __past) return __r / __scaling; + } + } + /** * @brief Shuffle the elements of a sequence using a uniform random * number generator. @@ -3740,6 +3756,40 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + for (_RandomAccessIterator __i = __first + 1; __i != __last; ) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + + if (__i + 1 == __last) + { + const __uc_type __pos = __generate_random_index_below(__swap_range, __g); + std::iter_swap(__i, __first + __pos); + return; + } + + // Use a single generator invocation to produce swap positions for + // both of the next two elements: + + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-03 14:42 ` Eelis van der Weegen @ 2016-05-25 19:54 ` Eelis 2016-05-25 20:45 ` Eelis 2016-08-31 12:45 ` Jonathan Wakely 1 sibling, 1 reply; 16+ messages in thread From: Eelis @ 2016-05-25 19:54 UTC (permalink / raw) Cc: libstdc++, gcc-patches >>> On 2016-05-01 16:18, Eelis wrote: >>>> The attached patch optimizes std::shuffle for the very common case >>>> where the generator range is large enough that a single invocation >>>> can produce two swap positions. >>>> >>>> This reduces the runtime of the following testcase by 37% on my machine: Gentle ping. :) Did anyone get a chance to take a look at this? Does the idea seem sound? Does the implementation seem correct? Thanks, Eelis ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-25 19:54 ` Eelis @ 2016-05-25 20:45 ` Eelis 0 siblings, 0 replies; 16+ messages in thread From: Eelis @ 2016-05-25 20:45 UTC (permalink / raw) To: gcc-patches; +Cc: libstdc++, gcc-patches >>> On 2016-05-01 16:18, Eelis wrote: >>>> The attached patch optimizes std::shuffle for the very common case >>>> where the generator range is large enough that a single invocation >>>> can produce two swap positions. >>>> >>>> This reduces the runtime of the following testcase by 37% on my machine: Gentle ping. :) Did anyone get a chance to take a look at this? Does the idea seem sound? Does the implementation seem correct? Thanks, Eelis ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-05-03 14:42 ` Eelis van der Weegen 2016-05-25 19:54 ` Eelis @ 2016-08-31 12:45 ` Jonathan Wakely 2016-09-01 15:14 ` Jonathan Wakely 2016-09-02 18:20 ` Eelis van der Weegen 1 sibling, 2 replies; 16+ messages in thread From: Jonathan Wakely @ 2016-08-31 12:45 UTC (permalink / raw) To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches On 03/05/16 16:42 +0200, Eelis van der Weegen wrote: >Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. > >On 2016-05-03 14:38, Jonathan Wakely wrote: >>ENOPATCH >> >>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: >>>Sorry, forgot to include the libstdc++ list. >>> >>>On 2016-05-01 16:18, Eelis wrote: >>>> >>>>Hi, >>>> >>>>The attached patch optimizes std::shuffle for the very common case >>>>where the generator range is large enough that a single invocation >>>>can produce two swap positions. >>>> >>>>This reduces the runtime of the following testcase by 37% on my machine: >>>> >>>> int main() >>>> { >>>> std::mt19937 gen; >>>> >>>> std::vector<int> v; >>>> v.reserve(10000); >>>> for (int i = 0; i != 10000; ++i) >>>> { >>>> v.push_back(i); >>>> std::shuffle(v.begin(), v.end(), gen); >>>> } >>>> >>>> std::cout << v.front() << '\n'; >>>> } >>>> >>>>Thoughts? >>>> >>>>Thanks, >>>> >>>>Eelis >>>> >>> >>> > >Index: libstdc++-v3/include/bits/stl_algo.h >=================================================================== >--- libstdc++-v3/include/bits/stl_algo.h (revision 235680) >+++ libstdc++-v3/include/bits/stl_algo.h (working copy) >@@ -3708,6 +3708,22 @@ > #endif > > #ifdef _GLIBCXX_USE_C99_STDINT_TR1 >+ >+ template<typename _IntType, typename _UniformRandomNumberGenerator> We should avoid introducing new names based on "uniform random number generator" and use _UniformRandomBitGenerator as per https://wg21.link/p0346r1 >+ inline _IntType >+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) >+ { >+ const _IntType __urngrange = __g.max() - __g.min() + 1; Similarly, let's use __urbgrange here. >+ const _IntType __scaling = __urngrange / __bound; I think I'd like either a comment on the function documenting the assumption about __bound and __g, or an explicit check: __glibcxx_assert( __scaling >= __bound ); >+ const _IntType __past = __bound * __scaling; >+ >+ for (;;) >+ { >+ const _IntType __r = _IntType(__g()) - __g.min(); >+ if (__r < __past) return __r / __scaling; This is basically the same algorithm as uniform_int_distribution so doesn't introduce any bias, right? Is this significantly faster than just using uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't need to duplicate the logic? (And people maintaining the code won't reconvince themselves it's correct every time they look at it :-) >+ } >+ } >+ > /** > * @brief Shuffle the elements of a sequence using a uniform random > * number generator. >@@ -3740,6 +3756,40 @@ > typedef typename std::make_unsigned<_DistanceType>::type __ud_type; > typedef typename std::uniform_int_distribution<__ud_type> __distr_type; > typedef typename __distr_type::param_type __p_type; >+ >+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; >+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; >+ >+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; >+ const __uc_type __urange = __uc_type(__last - __first); >+ >+ if (__urngrange / __urange >= __urange) >+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues. >+ { >+ for (_RandomAccessIterator __i = __first + 1; __i != __last; ) >+ { >+ const __uc_type __swap_range = __uc_type(__i - __first) + 1; >+ >+ if (__i + 1 == __last) Could we hoist this test out of the loop somehow? If we change the loop condition to be __i+1 < __last we don't need to test it on every iteration, and then after the loop we can just do the final swap if (__urange % 2). >+ { >+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g); >+ std::iter_swap(__i, __first + __pos); >+ return; >+ } >+ >+ // Use a single generator invocation to produce swap positions for >+ // both of the next two elements: >+ >+ const __uc_type __comp_range = __swap_range * (__swap_range + 1); >+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); >+ >+ std::iter_swap(__i++, __first + (__pospos % __swap_range)); >+ std::iter_swap(__i++, __first + (__pospos / __swap_range)); I think I've convinced myself this is correct :-) Values of __pospos will be uniformly distributed in [0, __comp_range) and so the / and % results will be too. >+ } >+ >+ return; >+ } >+ > __distr_type __d; > > for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-08-31 12:45 ` Jonathan Wakely @ 2016-09-01 15:14 ` Jonathan Wakely 2016-09-01 15:27 ` Marc Glisse 2016-09-01 15:31 ` Eelis van der Weegen 2016-09-02 18:20 ` Eelis van der Weegen 1 sibling, 2 replies; 16+ messages in thread From: Jonathan Wakely @ 2016-09-01 15:14 UTC (permalink / raw) To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches On 31/08/16 13:45 +0100, Jonathan Wakely wrote: >On 03/05/16 16:42 +0200, Eelis van der Weegen wrote: >>Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. >> >>On 2016-05-03 14:38, Jonathan Wakely wrote: >>>ENOPATCH >>> >>>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: >>>>Sorry, forgot to include the libstdc++ list. >>>> >>>>On 2016-05-01 16:18, Eelis wrote: >>>>> >>>>>Hi, >>>>> >>>>>The attached patch optimizes std::shuffle for the very common case >>>>>where the generator range is large enough that a single invocation >>>>>can produce two swap positions. >>>>> >>>>>This reduces the runtime of the following testcase by 37% on my machine: >>>>> >>>>> int main() >>>>> { >>>>> std::mt19937 gen; >>>>> >>>>> std::vector<int> v; >>>>> v.reserve(10000); >>>>> for (int i = 0; i != 10000; ++i) >>>>> { >>>>> v.push_back(i); >>>>> std::shuffle(v.begin(), v.end(), gen); >>>>> } >>>>> >>>>> std::cout << v.front() << '\n'; >>>>> } >>>>> >>>>>Thoughts? >>>>> >>>>>Thanks, >>>>> >>>>>Eelis >>>>> >>>> >>>> >> > >>Index: libstdc++-v3/include/bits/stl_algo.h >>=================================================================== >>--- libstdc++-v3/include/bits/stl_algo.h (revision 235680) >>+++ libstdc++-v3/include/bits/stl_algo.h (working copy) >>@@ -3708,6 +3708,22 @@ >>#endif >> >>#ifdef _GLIBCXX_USE_C99_STDINT_TR1 >>+ >>+ template<typename _IntType, typename _UniformRandomNumberGenerator> > >We should avoid introducing new names based on "uniform random number >generator" and use _UniformRandomBitGenerator as per >https://wg21.link/p0346r1 > >>+ inline _IntType >>+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) >>+ { >>+ const _IntType __urngrange = __g.max() - __g.min() + 1; > >Similarly, let's use __urbgrange here. > >>+ const _IntType __scaling = __urngrange / __bound; > >I think I'd like either a comment on the function documenting the >assumption about __bound and __g, or an explicit check: > > __glibcxx_assert( __scaling >= __bound ); > >>+ const _IntType __past = __bound * __scaling; >>+ >>+ for (;;) >>+ { >>+ const _IntType __r = _IntType(__g()) - __g.min(); >>+ if (__r < __past) return __r / __scaling; > >This is basically the same algorithm as uniform_int_distribution so >doesn't introduce any bias, right? > >Is this significantly faster than just using >uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >need to duplicate the logic? (And people maintaining the code won't >reconvince themselves it's correct every time they look at it :-) > > >>+ } >>+ } >>+ >> /** >> * @brief Shuffle the elements of a sequence using a uniform random >> * number generator. >>@@ -3740,6 +3756,40 @@ >> typedef typename std::make_unsigned<_DistanceType>::type __ud_type; >> typedef typename std::uniform_int_distribution<__ud_type> __distr_type; >> typedef typename __distr_type::param_type __p_type; >>+ >>+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; >>+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; >>+ >>+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; >>+ const __uc_type __urange = __uc_type(__last - __first); >>+ >>+ if (__urngrange / __urange >= __urange) >>+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues. >>+ { >>+ for (_RandomAccessIterator __i = __first + 1; __i != __last; ) >>+ { >>+ const __uc_type __swap_range = __uc_type(__i - __first) + 1; >>+ >>+ if (__i + 1 == __last) > >Could we hoist this test out of the loop somehow? > >If we change the loop condition to be __i+1 < __last we don't need to >test it on every iteration, and then after the loop we can just do >the final swap if (__urange % 2). > >>+ { >>+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g); >>+ std::iter_swap(__i, __first + __pos); >>+ return; >>+ } >>+ >>+ // Use a single generator invocation to produce swap positions for >>+ // both of the next two elements: >>+ >>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1); >>+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); >>+ >>+ std::iter_swap(__i++, __first + (__pospos % __swap_range)); >>+ std::iter_swap(__i++, __first + (__pospos / __swap_range)); > >I think I've convinced myself this is correct :-) > >Values of __pospos will be uniformly distributed in [0, __comp_range) iThis is true, but ... >and so the / and % results will be too. This isn't. If __swap_range is 3, then __comp_range is 10 and __pospos is uniformly distributed in [0, 9]. (__pospos % __swap_range) is not uniformly distributed, we get P(0) = 0.4, P(1) = 0.3, P(2) = 0.3. Similarly, (__pospos / __swap_range) is not uniform, we get P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1 This means that certain permuations of the input are more likely than others, which fails to meet the requirements of the function. Or is there a flaw in my reasoning? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-01 15:14 ` Jonathan Wakely @ 2016-09-01 15:27 ` Marc Glisse 2016-09-01 15:35 ` Jonathan Wakely 2016-09-01 15:31 ` Eelis van der Weegen 1 sibling, 1 reply; 16+ messages in thread From: Marc Glisse @ 2016-09-01 15:27 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Eelis van der Weegen, libstdc++, gcc-patches On Thu, 1 Sep 2016, Jonathan Wakely wrote: >>> + const __uc_type __comp_range = __swap_range * (__swap_range + 1); > > If __swap_range is 3, then __comp_range is 10 and ??? -- Marc Glisse ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-01 15:27 ` Marc Glisse @ 2016-09-01 15:35 ` Jonathan Wakely 0 siblings, 0 replies; 16+ messages in thread From: Jonathan Wakely @ 2016-09-01 15:35 UTC (permalink / raw) To: libstdc++; +Cc: Eelis van der Weegen, gcc-patches On 01/09/16 17:27 +0200, Marc Glisse wrote: >On Thu, 1 Sep 2016, Jonathan Wakely wrote: > >>>>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1); >> >>If __swap_range is 3, then __comp_range is 10 and > >??? Bah :-) Thanks. I guess I read the code correctly the other day at least! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-01 15:14 ` Jonathan Wakely 2016-09-01 15:27 ` Marc Glisse @ 2016-09-01 15:31 ` Eelis van der Weegen 2016-09-01 15:37 ` Jonathan Wakely 1 sibling, 1 reply; 16+ messages in thread From: Eelis van der Weegen @ 2016-09-01 15:31 UTC (permalink / raw) To: Jonathan Wakely; +Cc: libstdc++, gcc-patches On 2016-09-01 17:14, Jonathan Wakely wrote: > On 31/08/16 13:45 +0100, Jonathan Wakely wrote: >> On 03/05/16 16:42 +0200, Eelis van der Weegen wrote: >>> Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. >>> >>> On 2016-05-03 14:38, Jonathan Wakely wrote: >>>> ENOPATCH >>>> >>>> On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: >>>>> Sorry, forgot to include the libstdc++ list. >>>>> >>>>> On 2016-05-01 16:18, Eelis wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> The attached patch optimizes std::shuffle for the very common case >>>>>> where the generator range is large enough that a single invocation >>>>>> can produce two swap positions. >>>>>> >>>>>> This reduces the runtime of the following testcase by 37% on my machine: >>>>>> >>>>>> int main() >>>>>> { >>>>>> std::mt19937 gen; >>>>>> >>>>>> std::vector<int> v; >>>>>> v.reserve(10000); >>>>>> for (int i = 0; i != 10000; ++i) >>>>>> { >>>>>> v.push_back(i); >>>>>> std::shuffle(v.begin(), v.end(), gen); >>>>>> } >>>>>> >>>>>> std::cout << v.front() << '\n'; >>>>>> } >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Eelis >>>>>> >>>>> >>>>> >>> >> >>> Index: libstdc++-v3/include/bits/stl_algo.h >>> =================================================================== >>> --- libstdc++-v3/include/bits/stl_algo.h (revision 235680) >>> +++ libstdc++-v3/include/bits/stl_algo.h (working copy) >>> @@ -3708,6 +3708,22 @@ >>> #endif >>> >>> #ifdef _GLIBCXX_USE_C99_STDINT_TR1 >>> + >>> + template<typename _IntType, typename _UniformRandomNumberGenerator> >> >> We should avoid introducing new names based on "uniform random number >> generator" and use _UniformRandomBitGenerator as per >> https://wg21.link/p0346r1 >> >>> + inline _IntType >>> + __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) >>> + { >>> + const _IntType __urngrange = __g.max() - __g.min() + 1; >> >> Similarly, let's use __urbgrange here. >> >>> + const _IntType __scaling = __urngrange / __bound; >> >> I think I'd like either a comment on the function documenting the >> assumption about __bound and __g, or an explicit check: >> >> __glibcxx_assert( __scaling >= __bound ); >> >>> + const _IntType __past = __bound * __scaling; >>> + >>> + for (;;) >>> + { >>> + const _IntType __r = _IntType(__g()) - __g.min(); >>> + if (__r < __past) return __r / __scaling; >> >> This is basically the same algorithm as uniform_int_distribution so >> doesn't introduce any bias, right? >> >> Is this significantly faster than just using >> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >> need to duplicate the logic? (And people maintaining the code won't >> reconvince themselves it's correct every time they look at it :-) >> >> >>> + } >>> + } >>> + >>> /** >>> * @brief Shuffle the elements of a sequence using a uniform random >>> * number generator. >>> @@ -3740,6 +3756,40 @@ >>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type; >>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type; >>> typedef typename __distr_type::param_type __p_type; >>> + >>> + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; >>> + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; >>> + >>> + const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; >>> + const __uc_type __urange = __uc_type(__last - __first); >>> + >>> + if (__urngrange / __urange >= __urange) >>> + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. >>> + { >>> + for (_RandomAccessIterator __i = __first + 1; __i != __last; ) >>> + { >>> + const __uc_type __swap_range = __uc_type(__i - __first) + 1; >>> + >>> + if (__i + 1 == __last) >> >> Could we hoist this test out of the loop somehow? >> >> If we change the loop condition to be __i+1 < __last we don't need to >> test it on every iteration, and then after the loop we can just do >> the final swap if (__urange % 2). >> >>> + { >>> + const __uc_type __pos = __generate_random_index_below(__swap_range, __g); >>> + std::iter_swap(__i, __first + __pos); >>> + return; >>> + } >>> + >>> + // Use a single generator invocation to produce swap positions for >>> + // both of the next two elements: >>> + >>> + const __uc_type __comp_range = __swap_range * (__swap_range + 1); >>> + const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); >>> + >>> + std::iter_swap(__i++, __first + (__pospos % __swap_range)); >>> + std::iter_swap(__i++, __first + (__pospos / __swap_range)); >> >> I think I've convinced myself this is correct :-) >> >> Values of __pospos will be uniformly distributed in [0, __comp_range) > > iThis is true, but ... > >> and so the / and % results will be too. > > This isn't. > > If __swap_range is 3, then __comp_range is 10 and > __pospos is uniformly distributed in [0, 9]. > > (__pospos % __swap_range) is not uniformly distributed, we get > P(0) = 0.4, P(1) = 0.3, P(2) = 0.3. > > Similarly, (__pospos / __swap_range) is not uniform, we get > P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1 > > This means that certain permuations of the input are more likely than > others, which fails to meet the requirements of the function. > > Or is there a flaw in my reasoning? Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :) Thanks for the review! I'll send an updated patch addressing the other issues soon. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-01 15:31 ` Eelis van der Weegen @ 2016-09-01 15:37 ` Jonathan Wakely 0 siblings, 0 replies; 16+ messages in thread From: Jonathan Wakely @ 2016-09-01 15:37 UTC (permalink / raw) To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches On 01/09/16 17:31 +0200, Eelis van der Weegen wrote: >On 2016-09-01 17:14, Jonathan Wakely wrote: >>On 31/08/16 13:45 +0100, Jonathan Wakely wrote: >>>On 03/05/16 16:42 +0200, Eelis van der Weegen wrote: >>>>Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. >>>> >>>>On 2016-05-03 14:38, Jonathan Wakely wrote: >>>>>ENOPATCH >>>>> >>>>>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote: >>>>>>Sorry, forgot to include the libstdc++ list. >>>>>> >>>>>>On 2016-05-01 16:18, Eelis wrote: >>>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>The attached patch optimizes std::shuffle for the very common case >>>>>>>where the generator range is large enough that a single invocation >>>>>>>can produce two swap positions. >>>>>>> >>>>>>>This reduces the runtime of the following testcase by 37% on my machine: >>>>>>> >>>>>>> int main() >>>>>>> { >>>>>>> std::mt19937 gen; >>>>>>> >>>>>>> std::vector<int> v; >>>>>>> v.reserve(10000); >>>>>>> for (int i = 0; i != 10000; ++i) >>>>>>> { >>>>>>> v.push_back(i); >>>>>>> std::shuffle(v.begin(), v.end(), gen); >>>>>>> } >>>>>>> >>>>>>> std::cout << v.front() << '\n'; >>>>>>> } >>>>>>> >>>>>>>Thoughts? >>>>>>> >>>>>>>Thanks, >>>>>>> >>>>>>>Eelis >>>>>>> >>>>>> >>>>>> >>>> >>> >>>>Index: libstdc++-v3/include/bits/stl_algo.h >>>>=================================================================== >>>>--- libstdc++-v3/include/bits/stl_algo.h (revision 235680) >>>>+++ libstdc++-v3/include/bits/stl_algo.h (working copy) >>>>@@ -3708,6 +3708,22 @@ >>>>#endif >>>> >>>>#ifdef _GLIBCXX_USE_C99_STDINT_TR1 >>>>+ >>>>+ template<typename _IntType, typename _UniformRandomNumberGenerator> >>> >>>We should avoid introducing new names based on "uniform random number >>>generator" and use _UniformRandomBitGenerator as per >>>https://wg21.link/p0346r1 >>> >>>>+ inline _IntType >>>>+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) >>>>+ { >>>>+ const _IntType __urngrange = __g.max() - __g.min() + 1; >>> >>>Similarly, let's use __urbgrange here. >>> >>>>+ const _IntType __scaling = __urngrange / __bound; >>> >>>I think I'd like either a comment on the function documenting the >>>assumption about __bound and __g, or an explicit check: >>> >>> __glibcxx_assert( __scaling >= __bound ); >>> >>>>+ const _IntType __past = __bound * __scaling; >>>>+ >>>>+ for (;;) >>>>+ { >>>>+ const _IntType __r = _IntType(__g()) - __g.min(); >>>>+ if (__r < __past) return __r / __scaling; >>> >>>This is basically the same algorithm as uniform_int_distribution so >>>doesn't introduce any bias, right? >>> >>>Is this significantly faster than just using >>>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >>>need to duplicate the logic? (And people maintaining the code won't >>>reconvince themselves it's correct every time they look at it :-) >>> >>> >>>>+ } >>>>+ } >>>>+ >>>> /** >>>> * @brief Shuffle the elements of a sequence using a uniform random >>>> * number generator. >>>>@@ -3740,6 +3756,40 @@ >>>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type; >>>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type; >>>> typedef typename __distr_type::param_type __p_type; >>>>+ >>>>+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; >>>>+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; >>>>+ >>>>+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; >>>>+ const __uc_type __urange = __uc_type(__last - __first); >>>>+ >>>>+ if (__urngrange / __urange >= __urange) >>>>+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues. >>>>+ { >>>>+ for (_RandomAccessIterator __i = __first + 1; __i != __last; ) >>>>+ { >>>>+ const __uc_type __swap_range = __uc_type(__i - __first) + 1; >>>>+ >>>>+ if (__i + 1 == __last) >>> >>>Could we hoist this test out of the loop somehow? >>> >>>If we change the loop condition to be __i+1 < __last we don't need to >>>test it on every iteration, and then after the loop we can just do >>>the final swap if (__urange % 2). >>> >>>>+ { >>>>+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g); >>>>+ std::iter_swap(__i, __first + __pos); >>>>+ return; >>>>+ } >>>>+ >>>>+ // Use a single generator invocation to produce swap positions for >>>>+ // both of the next two elements: >>>>+ >>>>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1); >>>>+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); >>>>+ >>>>+ std::iter_swap(__i++, __first + (__pospos % __swap_range)); >>>>+ std::iter_swap(__i++, __first + (__pospos / __swap_range)); >>> >>>I think I've convinced myself this is correct :-) >>> >>>Values of __pospos will be uniformly distributed in [0, __comp_range) >> >>iThis is true, but ... >> >>>and so the / and % results will be too. >> >>This isn't. >> >>If __swap_range is 3, then __comp_range is 10 and >>__pospos is uniformly distributed in [0, 9]. >> >>(__pospos % __swap_range) is not uniformly distributed, we get >>P(0) = 0.4, P(1) = 0.3, P(2) = 0.3. >> >>Similarly, (__pospos / __swap_range) is not uniform, we get >>P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1 >> >>This means that certain permuations of the input are more likely than >>others, which fails to meet the requirements of the function. >> >>Or is there a flaw in my reasoning? > >Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :) A minor detail! >Thanks for the review! I'll send an updated patch addressing the other issues soon. I think my earlier comments are still sane, just ignore today's hasty email. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-08-31 12:45 ` Jonathan Wakely 2016-09-01 15:14 ` Jonathan Wakely @ 2016-09-02 18:20 ` Eelis van der Weegen 2016-09-02 18:53 ` Eelis 1 sibling, 1 reply; 16+ messages in thread From: Eelis van der Weegen @ 2016-09-02 18:20 UTC (permalink / raw) To: Jonathan Wakely; +Cc: libstdc++, gcc-patches [-- Attachment #1: Type: text/plain, Size: 816 bytes --] On 2016-08-31 14:45, Jonathan Wakely wrote: > Is this significantly faster than just using > uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't > need to duplicate the logic? (And people maintaining the code won't > reconvince themselves it's correct every time they look at it :-) > >[..] > > Could we hoist this test out of the loop somehow? > > If we change the loop condition to be __i+1 < __last we don't need to > test it on every iteration, and then after the loop we can just do > the final swap if (__urange % 2). Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below. I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay! Updated patch attached. [-- Attachment #2: new-double-step-shuffle.patch --] [-- Type: text/x-patch, Size: 1940 bytes --] Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 239895) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3772,6 +3772,46 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = __g.max() - __g.min(); + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + _RandomAccessIterator __i = __first + 1; + + // Since we know the range isn't empty, an even number of elements + // means an uneven number of elements /to swap/, in which case we + // do the first one up front: + + if ((__urange % 2) == 0) + { + std::iter_swap(__i++, __first + (__g() & 1)); + } + + // Now we know that __last - __i is even, so we do the rest in pairs, + // using a single distribution invocation to produce swap positions + // for two successive elements at a time: + + while (__i != __last) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + + std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; + const __uc_type __pospos = __d(__g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-02 18:20 ` Eelis van der Weegen @ 2016-09-02 18:53 ` Eelis 2016-09-02 19:27 ` Eelis 2016-10-14 19:42 ` Jonathan Wakely 0 siblings, 2 replies; 16+ messages in thread From: Eelis @ 2016-09-02 18:53 UTC (permalink / raw) To: Jonathan Wakely; +Cc: libstdc++, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1093 bytes --] On 2016-09-02 20:20, Eelis van der Weegen wrote: > On 2016-08-31 14:45, Jonathan Wakely wrote: >> Is this significantly faster than just using >> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >> need to duplicate the logic? (And people maintaining the code won't >> reconvince themselves it's correct every time they look at it :-) >> >> [..] >> >> Could we hoist this test out of the loop somehow? >> >> If we change the loop condition to be __i+1 < __last we don't need to >> test it on every iteration, and then after the loop we can just do >> the final swap if (__urange % 2). > > Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below. > > I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay! > > Updated patch attached. > Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading). Updated patch (which uses a proper distribution even for the [0,1] case) attached. [-- Attachment #2: newer-double-step-shuffle.patch --] [-- Type: text/x-patch, Size: 1965 bytes --] Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 239895) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3772,6 +3772,47 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = __g.max() - __g.min(); + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + _RandomAccessIterator __i = __first + 1; + + // Since we know the range isn't empty, an even number of elements + // means an uneven number of elements /to swap/, in which case we + // do the first one up front: + + if ((__urange % 2) == 0) + { + __distr_type __d{0, 1}; + std::iter_swap(__i++, __first + __d(__g)); + } + + // Now we know that __last - __i is even, so we do the rest in pairs, + // using a single distribution invocation to produce swap positions + // for two successive elements at a time: + + while (__i != __last) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + + std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; + const __uc_type __pospos = __d(__g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-02 18:53 ` Eelis @ 2016-09-02 19:27 ` Eelis 2016-10-14 19:42 ` Jonathan Wakely 1 sibling, 0 replies; 16+ messages in thread From: Eelis @ 2016-09-02 19:27 UTC (permalink / raw) To: gcc-patches; +Cc: libstdc++ [-- Attachment #1: Type: text/plain, Size: 1093 bytes --] On 2016-09-02 20:20, Eelis van der Weegen wrote: > On 2016-08-31 14:45, Jonathan Wakely wrote: >> Is this significantly faster than just using >> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >> need to duplicate the logic? (And people maintaining the code won't >> reconvince themselves it's correct every time they look at it :-) >> >> [..] >> >> Could we hoist this test out of the loop somehow? >> >> If we change the loop condition to be __i+1 < __last we don't need to >> test it on every iteration, and then after the loop we can just do >> the final swap if (__urange % 2). > > Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below. > > I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay! > > Updated patch attached. > Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading). Updated patch (which uses a proper distribution even for the [0,1] case) attached. [-- Attachment #2: newer-double-step-shuffle.patch --] [-- Type: text/x-patch, Size: 1965 bytes --] Index: libstdc++-v3/include/bits/stl_algo.h =================================================================== --- libstdc++-v3/include/bits/stl_algo.h (revision 239895) +++ libstdc++-v3/include/bits/stl_algo.h (working copy) @@ -3772,6 +3772,47 @@ typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = __g.max() - __g.min(); + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + _RandomAccessIterator __i = __first + 1; + + // Since we know the range isn't empty, an even number of elements + // means an uneven number of elements /to swap/, in which case we + // do the first one up front: + + if ((__urange % 2) == 0) + { + __distr_type __d{0, 1}; + std::iter_swap(__i++, __first + __d(__g)); + } + + // Now we know that __last - __i is even, so we do the rest in pairs, + // using a single distribution invocation to produce swap positions + // for two successive elements at a time: + + while (__i != __last) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + + std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; + const __uc_type __pospos = __d(__g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible 2016-09-02 18:53 ` Eelis 2016-09-02 19:27 ` Eelis @ 2016-10-14 19:42 ` Jonathan Wakely 1 sibling, 0 replies; 16+ messages in thread From: Jonathan Wakely @ 2016-10-14 19:42 UTC (permalink / raw) To: Eelis; +Cc: libstdc++, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1256 bytes --] On 02/09/16 20:53 +0200, Eelis wrote: >On 2016-09-02 20:20, Eelis van der Weegen wrote: >>On 2016-08-31 14:45, Jonathan Wakely wrote: >>>Is this significantly faster than just using >>>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >>>need to duplicate the logic? (And people maintaining the code won't >>>reconvince themselves it's correct every time they look at it :-) >>> >>>[..] >>> >>>Could we hoist this test out of the loop somehow? >>> >>>If we change the loop condition to be __i+1 < __last we don't need to >>>test it on every iteration, and then after the loop we can just do >>>the final swap if (__urange % 2). >> >>Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below. >> >>I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay! >> >>Updated patch attached. >> > >Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading). > >Updated patch (which uses a proper distribution even for the [0,1] case) attached. I've finally got round to committing this patch to trunk. Thanks for your patience, and sorry for the delay! [-- Attachment #2: patch.txt --] [-- Type: text/x-patch, Size: 2384 bytes --] commit 1ddf9566764a85da4826628098b352bd30ba2bbc Author: Jonathan Wakely <jwakely@redhat.com> Date: Fri Oct 14 20:27:13 2016 +0100 Optimize std::shuffle by using generator to get two values at once 2016-10-14 Eelis van der Weegen <eelis@eelis.net> * include/bits/stl_algo.h (shuffle): Extract two random numbers from each generator invocation when its range is large enough. diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h index 0538a79..db99cb8 100644 --- a/libstdc++-v3/include/bits/stl_algo.h +++ b/libstdc++-v3/include/bits/stl_algo.h @@ -3772,6 +3772,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef typename std::make_unsigned<_DistanceType>::type __ud_type; typedef typename std::uniform_int_distribution<__ud_type> __distr_type; typedef typename __distr_type::param_type __p_type; + + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type; + + const __uc_type __urngrange = __g.max() - __g.min(); + const __uc_type __urange = __uc_type(__last - __first); + + if (__urngrange / __urange >= __urange) + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. + { + _RandomAccessIterator __i = __first + 1; + + // Since we know the range isn't empty, an even number of elements + // means an uneven number of elements /to swap/, in which case we + // do the first one up front: + + if ((__urange % 2) == 0) + { + __distr_type __d{0, 1}; + std::iter_swap(__i++, __first + __d(__g)); + } + + // Now we know that __last - __i is even, so we do the rest in pairs, + // using a single distribution invocation to produce swap positions + // for two successive elements at a time: + + while (__i != __last) + { + const __uc_type __swap_range = __uc_type(__i - __first) + 1; + const __uc_type __comp_range = __swap_range * (__swap_range + 1); + + std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1}; + const __uc_type __pospos = __d(__g); + + std::iter_swap(__i++, __first + (__pospos % __swap_range)); + std::iter_swap(__i++, __first + (__pospos / __swap_range)); + } + + return; + } + __distr_type __d; for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-10-14 19:42 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-01 14:18 [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible Eelis 2016-05-01 14:30 ` Eelis 2016-05-03 12:39 ` Jonathan Wakely 2016-05-03 14:42 ` Eelis van der Weegen 2016-05-25 19:54 ` Eelis 2016-05-25 20:45 ` Eelis 2016-08-31 12:45 ` Jonathan Wakely 2016-09-01 15:14 ` Jonathan Wakely 2016-09-01 15:27 ` Marc Glisse 2016-09-01 15:35 ` Jonathan Wakely 2016-09-01 15:31 ` Eelis van der Weegen 2016-09-01 15:37 ` Jonathan Wakely 2016-09-02 18:20 ` Eelis van der Weegen 2016-09-02 18:53 ` Eelis 2016-09-02 19:27 ` Eelis 2016-10-14 19:42 ` Jonathan Wakely
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).