* [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
@ 2016-05-01 14:18 Eelis
2016-05-01 14:30 ` Eelis
0 siblings, 1 reply; 16+ messages in thread
From: Eelis @ 2016-05-01 14:18 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 494 bytes --]
Hi,
The attached patch optimizes std::shuffle for the very common case
where the generator range is large enough that a single invocation
can produce two swap positions.
This reduces the runtime of the following testcase by 37% on my machine:
int main()
{
std::mt19937 gen;
std::vector<int> v;
v.reserve(10000);
for (int i = 0; i != 10000; ++i)
{
v.push_back(i);
std::shuffle(v.begin(), v.end(), gen);
}
std::cout << v.front() << '\n';
}
Thoughts?
Thanks,
Eelis
[-- Attachment #2: double-step-shuffle.patch --]
[-- Type: text/x-patch, Size: 2440 bytes --]
Index: libstdc++-v3/include/bits/stl_algo.h
===================================================================
--- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
@@ -3708,6 +3708,22 @@
#endif
#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+
+ template<typename _IntType, typename _UniformRandomNumberGenerator>
+ inline _IntType
+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
+ {
+ const _IntType __urngrange = __g.max() - __g.min() + 1;
+ const _IntType __scaling = __urngrange / __bound;
+ const _IntType __past = __bound * __scaling;
+
+ for (;;)
+ {
+ const _IntType __r = _IntType(__g()) - __g.min();
+ if (__r < __past) return __r / __scaling;
+ }
+ }
+
/**
* @brief Shuffle the elements of a sequence using a uniform random
* number generator.
@@ -3740,6 +3756,40 @@
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ for (_RandomAccessIterator __i = __first + 1; __i != __last; )
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+
+ if (__i + 1 == __last)
+ {
+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
+ std::iter_swap(__i, __first + __pos);
+ return;
+ }
+
+ // Use a single generator invocation to produce swap positions for
+ // both of the next two elements:
+
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-01 14:18 [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible Eelis
@ 2016-05-01 14:30 ` Eelis
2016-05-03 12:39 ` Jonathan Wakely
0 siblings, 1 reply; 16+ messages in thread
From: Eelis @ 2016-05-01 14:30 UTC (permalink / raw)
To: gcc-patches; +Cc: libstdc++
Sorry, forgot to include the libstdc++ list.
On 2016-05-01 16:18, Eelis wrote:
> Hi,
>
> The attached patch optimizes std::shuffle for the very common case
> where the generator range is large enough that a single invocation
> can produce two swap positions.
>
> This reduces the runtime of the following testcase by 37% on my machine:
>
> int main()
> {
> std::mt19937 gen;
>
> std::vector<int> v;
> v.reserve(10000);
> for (int i = 0; i != 10000; ++i)
> {
> v.push_back(i);
> std::shuffle(v.begin(), v.end(), gen);
> }
>
> std::cout << v.front() << '\n';
> }
>
> Thoughts?
>
> Thanks,
>
> Eelis
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-01 14:30 ` Eelis
@ 2016-05-03 12:39 ` Jonathan Wakely
2016-05-03 14:42 ` Eelis van der Weegen
0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Wakely @ 2016-05-03 12:39 UTC (permalink / raw)
To: Eelis; +Cc: libstdc++, gcc-patches
ENOPATCH
On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
> Sorry, forgot to include the libstdc++ list.
>
> On 2016-05-01 16:18, Eelis wrote:
>>
>> Hi,
>>
>> The attached patch optimizes std::shuffle for the very common case
>> where the generator range is large enough that a single invocation
>> can produce two swap positions.
>>
>> This reduces the runtime of the following testcase by 37% on my machine:
>>
>> int main()
>> {
>> std::mt19937 gen;
>>
>> std::vector<int> v;
>> v.reserve(10000);
>> for (int i = 0; i != 10000; ++i)
>> {
>> v.push_back(i);
>> std::shuffle(v.begin(), v.end(), gen);
>> }
>>
>> std::cout << v.front() << '\n';
>> }
>>
>> Thoughts?
>>
>> Thanks,
>>
>> Eelis
>>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-03 12:39 ` Jonathan Wakely
@ 2016-05-03 14:42 ` Eelis van der Weegen
2016-05-25 19:54 ` Eelis
2016-08-31 12:45 ` Jonathan Wakely
0 siblings, 2 replies; 16+ messages in thread
From: Eelis van der Weegen @ 2016-05-03 14:42 UTC (permalink / raw)
Cc: libstdc++, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 985 bytes --]
Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
On 2016-05-03 14:38, Jonathan Wakely wrote:
> ENOPATCH
>
> On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>> Sorry, forgot to include the libstdc++ list.
>>
>> On 2016-05-01 16:18, Eelis wrote:
>>>
>>> Hi,
>>>
>>> The attached patch optimizes std::shuffle for the very common case
>>> where the generator range is large enough that a single invocation
>>> can produce two swap positions.
>>>
>>> This reduces the runtime of the following testcase by 37% on my machine:
>>>
>>> int main()
>>> {
>>> std::mt19937 gen;
>>>
>>> std::vector<int> v;
>>> v.reserve(10000);
>>> for (int i = 0; i != 10000; ++i)
>>> {
>>> v.push_back(i);
>>> std::shuffle(v.begin(), v.end(), gen);
>>> }
>>>
>>> std::cout << v.front() << '\n';
>>> }
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>>
>>> Eelis
>>>
>>
>>
[-- Attachment #2: double-step-shuffle.patch --]
[-- Type: text/x-patch, Size: 2440 bytes --]
Index: libstdc++-v3/include/bits/stl_algo.h
===================================================================
--- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
@@ -3708,6 +3708,22 @@
#endif
#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+
+ template<typename _IntType, typename _UniformRandomNumberGenerator>
+ inline _IntType
+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
+ {
+ const _IntType __urngrange = __g.max() - __g.min() + 1;
+ const _IntType __scaling = __urngrange / __bound;
+ const _IntType __past = __bound * __scaling;
+
+ for (;;)
+ {
+ const _IntType __r = _IntType(__g()) - __g.min();
+ if (__r < __past) return __r / __scaling;
+ }
+ }
+
/**
* @brief Shuffle the elements of a sequence using a uniform random
* number generator.
@@ -3740,6 +3756,40 @@
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ for (_RandomAccessIterator __i = __first + 1; __i != __last; )
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+
+ if (__i + 1 == __last)
+ {
+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
+ std::iter_swap(__i, __first + __pos);
+ return;
+ }
+
+ // Use a single generator invocation to produce swap positions for
+ // both of the next two elements:
+
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-03 14:42 ` Eelis van der Weegen
@ 2016-05-25 19:54 ` Eelis
2016-05-25 20:45 ` Eelis
2016-08-31 12:45 ` Jonathan Wakely
1 sibling, 1 reply; 16+ messages in thread
From: Eelis @ 2016-05-25 19:54 UTC (permalink / raw)
Cc: libstdc++, gcc-patches
>>> On 2016-05-01 16:18, Eelis wrote:
>>>> The attached patch optimizes std::shuffle for the very common case
>>>> where the generator range is large enough that a single invocation
>>>> can produce two swap positions.
>>>>
>>>> This reduces the runtime of the following testcase by 37% on my machine:
Gentle ping. :) Did anyone get a chance to take a look at this?
Does the idea seem sound? Does the implementation seem correct?
Thanks,
Eelis
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-25 19:54 ` Eelis
@ 2016-05-25 20:45 ` Eelis
0 siblings, 0 replies; 16+ messages in thread
From: Eelis @ 2016-05-25 20:45 UTC (permalink / raw)
To: gcc-patches; +Cc: libstdc++, gcc-patches
>>> On 2016-05-01 16:18, Eelis wrote:
>>>> The attached patch optimizes std::shuffle for the very common case
>>>> where the generator range is large enough that a single invocation
>>>> can produce two swap positions.
>>>>
>>>> This reduces the runtime of the following testcase by 37% on my machine:
Gentle ping. :) Did anyone get a chance to take a look at this?
Does the idea seem sound? Does the implementation seem correct?
Thanks,
Eelis
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-05-03 14:42 ` Eelis van der Weegen
2016-05-25 19:54 ` Eelis
@ 2016-08-31 12:45 ` Jonathan Wakely
2016-09-01 15:14 ` Jonathan Wakely
2016-09-02 18:20 ` Eelis van der Weegen
1 sibling, 2 replies; 16+ messages in thread
From: Jonathan Wakely @ 2016-08-31 12:45 UTC (permalink / raw)
To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches
On 03/05/16 16:42 +0200, Eelis van der Weegen wrote:
>Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
>
>On 2016-05-03 14:38, Jonathan Wakely wrote:
>>ENOPATCH
>>
>>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>>>Sorry, forgot to include the libstdc++ list.
>>>
>>>On 2016-05-01 16:18, Eelis wrote:
>>>>
>>>>Hi,
>>>>
>>>>The attached patch optimizes std::shuffle for the very common case
>>>>where the generator range is large enough that a single invocation
>>>>can produce two swap positions.
>>>>
>>>>This reduces the runtime of the following testcase by 37% on my machine:
>>>>
>>>> int main()
>>>> {
>>>> std::mt19937 gen;
>>>>
>>>> std::vector<int> v;
>>>> v.reserve(10000);
>>>> for (int i = 0; i != 10000; ++i)
>>>> {
>>>> v.push_back(i);
>>>> std::shuffle(v.begin(), v.end(), gen);
>>>> }
>>>>
>>>> std::cout << v.front() << '\n';
>>>> }
>>>>
>>>>Thoughts?
>>>>
>>>>Thanks,
>>>>
>>>>Eelis
>>>>
>>>
>>>
>
>Index: libstdc++-v3/include/bits/stl_algo.h
>===================================================================
>--- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
>+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
>@@ -3708,6 +3708,22 @@
> #endif
>
> #ifdef _GLIBCXX_USE_C99_STDINT_TR1
>+
>+ template<typename _IntType, typename _UniformRandomNumberGenerator>
We should avoid introducing new names based on "uniform random number
generator" and use _UniformRandomBitGenerator as per
https://wg21.link/p0346r1
>+ inline _IntType
>+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
>+ {
>+ const _IntType __urngrange = __g.max() - __g.min() + 1;
Similarly, let's use __urbgrange here.
>+ const _IntType __scaling = __urngrange / __bound;
I think I'd like either a comment on the function documenting the
assumption about __bound and __g, or an explicit check:
__glibcxx_assert( __scaling >= __bound );
>+ const _IntType __past = __bound * __scaling;
>+
>+ for (;;)
>+ {
>+ const _IntType __r = _IntType(__g()) - __g.min();
>+ if (__r < __past) return __r / __scaling;
This is basically the same algorithm as uniform_int_distribution so
doesn't introduce any bias, right?
Is this significantly faster than just using
uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
need to duplicate the logic? (And people maintaining the code won't
reconvince themselves it's correct every time they look at it :-)
>+ }
>+ }
>+
> /**
> * @brief Shuffle the elements of a sequence using a uniform random
> * number generator.
>@@ -3740,6 +3756,40 @@
> typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
> typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
> typedef typename __distr_type::param_type __p_type;
>+
>+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
>+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
>+
>+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
>+ const __uc_type __urange = __uc_type(__last - __first);
>+
>+ if (__urngrange / __urange >= __urange)
>+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
>+ {
>+ for (_RandomAccessIterator __i = __first + 1; __i != __last; )
>+ {
>+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
>+
>+ if (__i + 1 == __last)
Could we hoist this test out of the loop somehow?
If we change the loop condition to be __i+1 < __last we don't need to
test it on every iteration, and then after the loop we can just do
the final swap if (__urange % 2).
>+ {
>+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
>+ std::iter_swap(__i, __first + __pos);
>+ return;
>+ }
>+
>+ // Use a single generator invocation to produce swap positions for
>+ // both of the next two elements:
>+
>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
>+
>+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
>+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
I think I've convinced myself this is correct :-)
Values of __pospos will be uniformly distributed in [0, __comp_range)
and so the / and % results will be too.
>+ }
>+
>+ return;
>+ }
>+
> __distr_type __d;
>
> for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-08-31 12:45 ` Jonathan Wakely
@ 2016-09-01 15:14 ` Jonathan Wakely
2016-09-01 15:27 ` Marc Glisse
2016-09-01 15:31 ` Eelis van der Weegen
2016-09-02 18:20 ` Eelis van der Weegen
1 sibling, 2 replies; 16+ messages in thread
From: Jonathan Wakely @ 2016-09-01 15:14 UTC (permalink / raw)
To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches
On 31/08/16 13:45 +0100, Jonathan Wakely wrote:
>On 03/05/16 16:42 +0200, Eelis van der Weegen wrote:
>>Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
>>
>>On 2016-05-03 14:38, Jonathan Wakely wrote:
>>>ENOPATCH
>>>
>>>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>>>>Sorry, forgot to include the libstdc++ list.
>>>>
>>>>On 2016-05-01 16:18, Eelis wrote:
>>>>>
>>>>>Hi,
>>>>>
>>>>>The attached patch optimizes std::shuffle for the very common case
>>>>>where the generator range is large enough that a single invocation
>>>>>can produce two swap positions.
>>>>>
>>>>>This reduces the runtime of the following testcase by 37% on my machine:
>>>>>
>>>>> int main()
>>>>> {
>>>>> std::mt19937 gen;
>>>>>
>>>>> std::vector<int> v;
>>>>> v.reserve(10000);
>>>>> for (int i = 0; i != 10000; ++i)
>>>>> {
>>>>> v.push_back(i);
>>>>> std::shuffle(v.begin(), v.end(), gen);
>>>>> }
>>>>>
>>>>> std::cout << v.front() << '\n';
>>>>> }
>>>>>
>>>>>Thoughts?
>>>>>
>>>>>Thanks,
>>>>>
>>>>>Eelis
>>>>>
>>>>
>>>>
>>
>
>>Index: libstdc++-v3/include/bits/stl_algo.h
>>===================================================================
>>--- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
>>+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
>>@@ -3708,6 +3708,22 @@
>>#endif
>>
>>#ifdef _GLIBCXX_USE_C99_STDINT_TR1
>>+
>>+ template<typename _IntType, typename _UniformRandomNumberGenerator>
>
>We should avoid introducing new names based on "uniform random number
>generator" and use _UniformRandomBitGenerator as per
>https://wg21.link/p0346r1
>
>>+ inline _IntType
>>+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
>>+ {
>>+ const _IntType __urngrange = __g.max() - __g.min() + 1;
>
>Similarly, let's use __urbgrange here.
>
>>+ const _IntType __scaling = __urngrange / __bound;
>
>I think I'd like either a comment on the function documenting the
>assumption about __bound and __g, or an explicit check:
>
> __glibcxx_assert( __scaling >= __bound );
>
>>+ const _IntType __past = __bound * __scaling;
>>+
>>+ for (;;)
>>+ {
>>+ const _IntType __r = _IntType(__g()) - __g.min();
>>+ if (__r < __past) return __r / __scaling;
>
>This is basically the same algorithm as uniform_int_distribution so
>doesn't introduce any bias, right?
>
>Is this significantly faster than just using
>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>need to duplicate the logic? (And people maintaining the code won't
>reconvince themselves it's correct every time they look at it :-)
>
>
>>+ }
>>+ }
>>+
>> /**
>> * @brief Shuffle the elements of a sequence using a uniform random
>> * number generator.
>>@@ -3740,6 +3756,40 @@
>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
>> typedef typename __distr_type::param_type __p_type;
>>+
>>+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
>>+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
>>+
>>+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
>>+ const __uc_type __urange = __uc_type(__last - __first);
>>+
>>+ if (__urngrange / __urange >= __urange)
>>+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
>>+ {
>>+ for (_RandomAccessIterator __i = __first + 1; __i != __last; )
>>+ {
>>+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
>>+
>>+ if (__i + 1 == __last)
>
>Could we hoist this test out of the loop somehow?
>
>If we change the loop condition to be __i+1 < __last we don't need to
>test it on every iteration, and then after the loop we can just do
>the final swap if (__urange % 2).
>
>>+ {
>>+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
>>+ std::iter_swap(__i, __first + __pos);
>>+ return;
>>+ }
>>+
>>+ // Use a single generator invocation to produce swap positions for
>>+ // both of the next two elements:
>>+
>>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>>+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
>>+
>>+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
>>+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
>
>I think I've convinced myself this is correct :-)
>
>Values of __pospos will be uniformly distributed in [0, __comp_range)
iThis is true, but ...
>and so the / and % results will be too.
This isn't.
If __swap_range is 3, then __comp_range is 10 and
__pospos is uniformly distributed in [0, 9].
(__pospos % __swap_range) is not uniformly distributed, we get
P(0) = 0.4, P(1) = 0.3, P(2) = 0.3.
Similarly, (__pospos / __swap_range) is not uniform, we get
P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1
This means that certain permuations of the input are more likely than
others, which fails to meet the requirements of the function.
Or is there a flaw in my reasoning?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-01 15:14 ` Jonathan Wakely
@ 2016-09-01 15:27 ` Marc Glisse
2016-09-01 15:35 ` Jonathan Wakely
2016-09-01 15:31 ` Eelis van der Weegen
1 sibling, 1 reply; 16+ messages in thread
From: Marc Glisse @ 2016-09-01 15:27 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Eelis van der Weegen, libstdc++, gcc-patches
On Thu, 1 Sep 2016, Jonathan Wakely wrote:
>>> + const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>
> If __swap_range is 3, then __comp_range is 10 and
???
--
Marc Glisse
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-01 15:14 ` Jonathan Wakely
2016-09-01 15:27 ` Marc Glisse
@ 2016-09-01 15:31 ` Eelis van der Weegen
2016-09-01 15:37 ` Jonathan Wakely
1 sibling, 1 reply; 16+ messages in thread
From: Eelis van der Weegen @ 2016-09-01 15:31 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches
On 2016-09-01 17:14, Jonathan Wakely wrote:
> On 31/08/16 13:45 +0100, Jonathan Wakely wrote:
>> On 03/05/16 16:42 +0200, Eelis van der Weegen wrote:
>>> Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
>>>
>>> On 2016-05-03 14:38, Jonathan Wakely wrote:
>>>> ENOPATCH
>>>>
>>>> On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>>>>> Sorry, forgot to include the libstdc++ list.
>>>>>
>>>>> On 2016-05-01 16:18, Eelis wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The attached patch optimizes std::shuffle for the very common case
>>>>>> where the generator range is large enough that a single invocation
>>>>>> can produce two swap positions.
>>>>>>
>>>>>> This reduces the runtime of the following testcase by 37% on my machine:
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>> std::mt19937 gen;
>>>>>>
>>>>>> std::vector<int> v;
>>>>>> v.reserve(10000);
>>>>>> for (int i = 0; i != 10000; ++i)
>>>>>> {
>>>>>> v.push_back(i);
>>>>>> std::shuffle(v.begin(), v.end(), gen);
>>>>>> }
>>>>>>
>>>>>> std::cout << v.front() << '\n';
>>>>>> }
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eelis
>>>>>>
>>>>>
>>>>>
>>>
>>
>>> Index: libstdc++-v3/include/bits/stl_algo.h
>>> ===================================================================
>>> --- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
>>> +++ libstdc++-v3/include/bits/stl_algo.h (working copy)
>>> @@ -3708,6 +3708,22 @@
>>> #endif
>>>
>>> #ifdef _GLIBCXX_USE_C99_STDINT_TR1
>>> +
>>> + template<typename _IntType, typename _UniformRandomNumberGenerator>
>>
>> We should avoid introducing new names based on "uniform random number
>> generator" and use _UniformRandomBitGenerator as per
>> https://wg21.link/p0346r1
>>
>>> + inline _IntType
>>> + __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
>>> + {
>>> + const _IntType __urngrange = __g.max() - __g.min() + 1;
>>
>> Similarly, let's use __urbgrange here.
>>
>>> + const _IntType __scaling = __urngrange / __bound;
>>
>> I think I'd like either a comment on the function documenting the
>> assumption about __bound and __g, or an explicit check:
>>
>> __glibcxx_assert( __scaling >= __bound );
>>
>>> + const _IntType __past = __bound * __scaling;
>>> +
>>> + for (;;)
>>> + {
>>> + const _IntType __r = _IntType(__g()) - __g.min();
>>> + if (__r < __past) return __r / __scaling;
>>
>> This is basically the same algorithm as uniform_int_distribution so
>> doesn't introduce any bias, right?
>>
>> Is this significantly faster than just using
>> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>> need to duplicate the logic? (And people maintaining the code won't
>> reconvince themselves it's correct every time they look at it :-)
>>
>>
>>> + }
>>> + }
>>> +
>>> /**
>>> * @brief Shuffle the elements of a sequence using a uniform random
>>> * number generator.
>>> @@ -3740,6 +3756,40 @@
>>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
>>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
>>> typedef typename __distr_type::param_type __p_type;
>>> +
>>> + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
>>> + typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
>>> +
>>> + const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
>>> + const __uc_type __urange = __uc_type(__last - __first);
>>> +
>>> + if (__urngrange / __urange >= __urange)
>>> + // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
>>> + {
>>> + for (_RandomAccessIterator __i = __first + 1; __i != __last; )
>>> + {
>>> + const __uc_type __swap_range = __uc_type(__i - __first) + 1;
>>> +
>>> + if (__i + 1 == __last)
>>
>> Could we hoist this test out of the loop somehow?
>>
>> If we change the loop condition to be __i+1 < __last we don't need to
>> test it on every iteration, and then after the loop we can just do
>> the final swap if (__urange % 2).
>>
>>> + {
>>> + const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
>>> + std::iter_swap(__i, __first + __pos);
>>> + return;
>>> + }
>>> +
>>> + // Use a single generator invocation to produce swap positions for
>>> + // both of the next two elements:
>>> +
>>> + const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>>> + const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
>>> +
>>> + std::iter_swap(__i++, __first + (__pospos % __swap_range));
>>> + std::iter_swap(__i++, __first + (__pospos / __swap_range));
>>
>> I think I've convinced myself this is correct :-)
>>
>> Values of __pospos will be uniformly distributed in [0, __comp_range)
>
> iThis is true, but ...
>
>> and so the / and % results will be too.
>
> This isn't.
>
> If __swap_range is 3, then __comp_range is 10 and
> __pospos is uniformly distributed in [0, 9].
>
> (__pospos % __swap_range) is not uniformly distributed, we get
> P(0) = 0.4, P(1) = 0.3, P(2) = 0.3.
>
> Similarly, (__pospos / __swap_range) is not uniform, we get
> P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1
>
> This means that certain permuations of the input are more likely than
> others, which fails to meet the requirements of the function.
>
> Or is there a flaw in my reasoning?
Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :)
Thanks for the review! I'll send an updated patch addressing the other issues soon.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-01 15:27 ` Marc Glisse
@ 2016-09-01 15:35 ` Jonathan Wakely
0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Wakely @ 2016-09-01 15:35 UTC (permalink / raw)
To: libstdc++; +Cc: Eelis van der Weegen, gcc-patches
On 01/09/16 17:27 +0200, Marc Glisse wrote:
>On Thu, 1 Sep 2016, Jonathan Wakely wrote:
>
>>>>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>>
>>If __swap_range is 3, then __comp_range is 10 and
>
>???
Bah :-)
Thanks. I guess I read the code correctly the other day at least!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-01 15:31 ` Eelis van der Weegen
@ 2016-09-01 15:37 ` Jonathan Wakely
0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Wakely @ 2016-09-01 15:37 UTC (permalink / raw)
To: Eelis van der Weegen; +Cc: libstdc++, gcc-patches
On 01/09/16 17:31 +0200, Eelis van der Weegen wrote:
>On 2016-09-01 17:14, Jonathan Wakely wrote:
>>On 31/08/16 13:45 +0100, Jonathan Wakely wrote:
>>>On 03/05/16 16:42 +0200, Eelis van der Weegen wrote:
>>>>Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
>>>>
>>>>On 2016-05-03 14:38, Jonathan Wakely wrote:
>>>>>ENOPATCH
>>>>>
>>>>>On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>>>>>>Sorry, forgot to include the libstdc++ list.
>>>>>>
>>>>>>On 2016-05-01 16:18, Eelis wrote:
>>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>The attached patch optimizes std::shuffle for the very common case
>>>>>>>where the generator range is large enough that a single invocation
>>>>>>>can produce two swap positions.
>>>>>>>
>>>>>>>This reduces the runtime of the following testcase by 37% on my machine:
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>> std::mt19937 gen;
>>>>>>>
>>>>>>> std::vector<int> v;
>>>>>>> v.reserve(10000);
>>>>>>> for (int i = 0; i != 10000; ++i)
>>>>>>> {
>>>>>>> v.push_back(i);
>>>>>>> std::shuffle(v.begin(), v.end(), gen);
>>>>>>> }
>>>>>>>
>>>>>>> std::cout << v.front() << '\n';
>>>>>>> }
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>
>>>>>>>Eelis
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>>Index: libstdc++-v3/include/bits/stl_algo.h
>>>>===================================================================
>>>>--- libstdc++-v3/include/bits/stl_algo.h (revision 235680)
>>>>+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
>>>>@@ -3708,6 +3708,22 @@
>>>>#endif
>>>>
>>>>#ifdef _GLIBCXX_USE_C99_STDINT_TR1
>>>>+
>>>>+ template<typename _IntType, typename _UniformRandomNumberGenerator>
>>>
>>>We should avoid introducing new names based on "uniform random number
>>>generator" and use _UniformRandomBitGenerator as per
>>>https://wg21.link/p0346r1
>>>
>>>>+ inline _IntType
>>>>+ __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
>>>>+ {
>>>>+ const _IntType __urngrange = __g.max() - __g.min() + 1;
>>>
>>>Similarly, let's use __urbgrange here.
>>>
>>>>+ const _IntType __scaling = __urngrange / __bound;
>>>
>>>I think I'd like either a comment on the function documenting the
>>>assumption about __bound and __g, or an explicit check:
>>>
>>> __glibcxx_assert( __scaling >= __bound );
>>>
>>>>+ const _IntType __past = __bound * __scaling;
>>>>+
>>>>+ for (;;)
>>>>+ {
>>>>+ const _IntType __r = _IntType(__g()) - __g.min();
>>>>+ if (__r < __past) return __r / __scaling;
>>>
>>>This is basically the same algorithm as uniform_int_distribution so
>>>doesn't introduce any bias, right?
>>>
>>>Is this significantly faster than just using
>>>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>>>need to duplicate the logic? (And people maintaining the code won't
>>>reconvince themselves it's correct every time they look at it :-)
>>>
>>>
>>>>+ }
>>>>+ }
>>>>+
>>>> /**
>>>> * @brief Shuffle the elements of a sequence using a uniform random
>>>> * number generator.
>>>>@@ -3740,6 +3756,40 @@
>>>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
>>>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
>>>> typedef typename __distr_type::param_type __p_type;
>>>>+
>>>>+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
>>>>+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
>>>>+
>>>>+ const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
>>>>+ const __uc_type __urange = __uc_type(__last - __first);
>>>>+
>>>>+ if (__urngrange / __urange >= __urange)
>>>>+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
>>>>+ {
>>>>+ for (_RandomAccessIterator __i = __first + 1; __i != __last; )
>>>>+ {
>>>>+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
>>>>+
>>>>+ if (__i + 1 == __last)
>>>
>>>Could we hoist this test out of the loop somehow?
>>>
>>>If we change the loop condition to be __i+1 < __last we don't need to
>>>test it on every iteration, and then after the loop we can just do
>>>the final swap if (__urange % 2).
>>>
>>>>+ {
>>>>+ const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
>>>>+ std::iter_swap(__i, __first + __pos);
>>>>+ return;
>>>>+ }
>>>>+
>>>>+ // Use a single generator invocation to produce swap positions for
>>>>+ // both of the next two elements:
>>>>+
>>>>+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>>>>+ const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
>>>>+
>>>>+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
>>>>+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
>>>
>>>I think I've convinced myself this is correct :-)
>>>
>>>Values of __pospos will be uniformly distributed in [0, __comp_range)
>>
>>iThis is true, but ...
>>
>>>and so the / and % results will be too.
>>
>>This isn't.
>>
>>If __swap_range is 3, then __comp_range is 10 and
>>__pospos is uniformly distributed in [0, 9].
>>
>>(__pospos % __swap_range) is not uniformly distributed, we get
>>P(0) = 0.4, P(1) = 0.3, P(2) = 0.3.
>>
>>Similarly, (__pospos / __swap_range) is not uniform, we get
>>P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1
>>
>>This means that certain permuations of the input are more likely than
>>others, which fails to meet the requirements of the function.
>>
>>Or is there a flaw in my reasoning?
>
>Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :)
A minor detail!
>Thanks for the review! I'll send an updated patch addressing the other issues soon.
I think my earlier comments are still sane, just ignore today's hasty email.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-08-31 12:45 ` Jonathan Wakely
2016-09-01 15:14 ` Jonathan Wakely
@ 2016-09-02 18:20 ` Eelis van der Weegen
2016-09-02 18:53 ` Eelis
1 sibling, 1 reply; 16+ messages in thread
From: Eelis van der Weegen @ 2016-09-02 18:20 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 816 bytes --]
On 2016-08-31 14:45, Jonathan Wakely wrote:
> Is this significantly faster than just using
> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
> need to duplicate the logic? (And people maintaining the code won't
> reconvince themselves it's correct every time they look at it :-)
>
>[..]
>
> Could we hoist this test out of the loop somehow?
>
> If we change the loop condition to be __i+1 < __last we don't need to
> test it on every iteration, and then after the loop we can just do
> the final swap if (__urange % 2).
Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below.
I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay!
Updated patch attached.
[-- Attachment #2: new-double-step-shuffle.patch --]
[-- Type: text/x-patch, Size: 1940 bytes --]
Index: libstdc++-v3/include/bits/stl_algo.h
===================================================================
--- libstdc++-v3/include/bits/stl_algo.h (revision 239895)
+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
@@ -3772,6 +3772,46 @@
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = __g.max() - __g.min();
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ _RandomAccessIterator __i = __first + 1;
+
+ // Since we know the range isn't empty, an even number of elements
+ // means an uneven number of elements /to swap/, in which case we
+ // do the first one up front:
+
+ if ((__urange % 2) == 0)
+ {
+ std::iter_swap(__i++, __first + (__g() & 1));
+ }
+
+ // Now we know that __last - __i is even, so we do the rest in pairs,
+ // using a single distribution invocation to produce swap positions
+ // for two successive elements at a time:
+
+ while (__i != __last)
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+
+ std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
+ const __uc_type __pospos = __d(__g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-02 18:20 ` Eelis van der Weegen
@ 2016-09-02 18:53 ` Eelis
2016-09-02 19:27 ` Eelis
2016-10-14 19:42 ` Jonathan Wakely
0 siblings, 2 replies; 16+ messages in thread
From: Eelis @ 2016-09-02 18:53 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]
On 2016-09-02 20:20, Eelis van der Weegen wrote:
> On 2016-08-31 14:45, Jonathan Wakely wrote:
>> Is this significantly faster than just using
>> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>> need to duplicate the logic? (And people maintaining the code won't
>> reconvince themselves it's correct every time they look at it :-)
>>
>> [..]
>>
>> Could we hoist this test out of the loop somehow?
>>
>> If we change the loop condition to be __i+1 < __last we don't need to
>> test it on every iteration, and then after the loop we can just do
>> the final swap if (__urange % 2).
>
> Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below.
>
> I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay!
>
> Updated patch attached.
>
Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading).
Updated patch (which uses a proper distribution even for the [0,1] case) attached.
[-- Attachment #2: newer-double-step-shuffle.patch --]
[-- Type: text/x-patch, Size: 1965 bytes --]
Index: libstdc++-v3/include/bits/stl_algo.h
===================================================================
--- libstdc++-v3/include/bits/stl_algo.h (revision 239895)
+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
@@ -3772,6 +3772,47 @@
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = __g.max() - __g.min();
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ _RandomAccessIterator __i = __first + 1;
+
+ // Since we know the range isn't empty, an even number of elements
+ // means an uneven number of elements /to swap/, in which case we
+ // do the first one up front:
+
+ if ((__urange % 2) == 0)
+ {
+ __distr_type __d{0, 1};
+ std::iter_swap(__i++, __first + __d(__g));
+ }
+
+ // Now we know that __last - __i is even, so we do the rest in pairs,
+ // using a single distribution invocation to produce swap positions
+ // for two successive elements at a time:
+
+ while (__i != __last)
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+
+ std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
+ const __uc_type __pospos = __d(__g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-02 18:53 ` Eelis
@ 2016-09-02 19:27 ` Eelis
2016-10-14 19:42 ` Jonathan Wakely
1 sibling, 0 replies; 16+ messages in thread
From: Eelis @ 2016-09-02 19:27 UTC (permalink / raw)
To: gcc-patches; +Cc: libstdc++
[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]
On 2016-09-02 20:20, Eelis van der Weegen wrote:
> On 2016-08-31 14:45, Jonathan Wakely wrote:
>> Is this significantly faster than just using
>> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>> need to duplicate the logic? (And people maintaining the code won't
>> reconvince themselves it's correct every time they look at it :-)
>>
>> [..]
>>
>> Could we hoist this test out of the loop somehow?
>>
>> If we change the loop condition to be __i+1 < __last we don't need to
>> test it on every iteration, and then after the loop we can just do
>> the final swap if (__urange % 2).
>
> Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below.
>
> I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay!
>
> Updated patch attached.
>
Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading).
Updated patch (which uses a proper distribution even for the [0,1] case) attached.
[-- Attachment #2: newer-double-step-shuffle.patch --]
[-- Type: text/x-patch, Size: 1965 bytes --]
Index: libstdc++-v3/include/bits/stl_algo.h
===================================================================
--- libstdc++-v3/include/bits/stl_algo.h (revision 239895)
+++ libstdc++-v3/include/bits/stl_algo.h (working copy)
@@ -3772,6 +3772,47 @@
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = __g.max() - __g.min();
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ _RandomAccessIterator __i = __first + 1;
+
+ // Since we know the range isn't empty, an even number of elements
+ // means an uneven number of elements /to swap/, in which case we
+ // do the first one up front:
+
+ if ((__urange % 2) == 0)
+ {
+ __distr_type __d{0, 1};
+ std::iter_swap(__i++, __first + __d(__g));
+ }
+
+ // Now we know that __last - __i is even, so we do the rest in pairs,
+ // using a single distribution invocation to produce swap positions
+ // for two successive elements at a time:
+
+ while (__i != __last)
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+
+ std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
+ const __uc_type __pospos = __d(__g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
2016-09-02 18:53 ` Eelis
2016-09-02 19:27 ` Eelis
@ 2016-10-14 19:42 ` Jonathan Wakely
1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Wakely @ 2016-10-14 19:42 UTC (permalink / raw)
To: Eelis; +Cc: libstdc++, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]
On 02/09/16 20:53 +0200, Eelis wrote:
>On 2016-09-02 20:20, Eelis van der Weegen wrote:
>>On 2016-08-31 14:45, Jonathan Wakely wrote:
>>>Is this significantly faster than just using
>>>uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>>>need to duplicate the logic? (And people maintaining the code won't
>>>reconvince themselves it's correct every time they look at it :-)
>>>
>>>[..]
>>>
>>>Could we hoist this test out of the loop somehow?
>>>
>>>If we change the loop condition to be __i+1 < __last we don't need to
>>>test it on every iteration, and then after the loop we can just do
>>>the final swap if (__urange % 2).
>>
>>Reusing std::uniform_int_distribution seems just as fast, so I've removed __generate_random_index_below.
>>
>>I've hoisted the (__i + 1 == __last) check out of the loop (in a slightly different way), and it seems to shave off a couple more cycles, yay!
>>
>>Updated patch attached.
>>
>
>Please ignore that patch, I used __g()&1 but that's invalid (the new "UniformRandomBitGenerator" name is misleading).
>
>Updated patch (which uses a proper distribution even for the [0,1] case) attached.
I've finally got round to committing this patch to trunk.
Thanks for your patience, and sorry for the delay!
[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 2384 bytes --]
commit 1ddf9566764a85da4826628098b352bd30ba2bbc
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Fri Oct 14 20:27:13 2016 +0100
Optimize std::shuffle by using generator to get two values at once
2016-10-14 Eelis van der Weegen <eelis@eelis.net>
* include/bits/stl_algo.h (shuffle): Extract two random numbers from
each generator invocation when its range is large enough.
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 0538a79..db99cb8 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3772,6 +3772,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
typedef typename __distr_type::param_type __p_type;
+
+ typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
+ typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
+
+ const __uc_type __urngrange = __g.max() - __g.min();
+ const __uc_type __urange = __uc_type(__last - __first);
+
+ if (__urngrange / __urange >= __urange)
+ // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
+ {
+ _RandomAccessIterator __i = __first + 1;
+
+ // Since we know the range isn't empty, an even number of elements
+ // means an uneven number of elements /to swap/, in which case we
+ // do the first one up front:
+
+ if ((__urange % 2) == 0)
+ {
+ __distr_type __d{0, 1};
+ std::iter_swap(__i++, __first + __d(__g));
+ }
+
+ // Now we know that __last - __i is even, so we do the rest in pairs,
+ // using a single distribution invocation to produce swap positions
+ // for two successive elements at a time:
+
+ while (__i != __last)
+ {
+ const __uc_type __swap_range = __uc_type(__i - __first) + 1;
+ const __uc_type __comp_range = __swap_range * (__swap_range + 1);
+
+ std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
+ const __uc_type __pospos = __d(__g);
+
+ std::iter_swap(__i++, __first + (__pospos % __swap_range));
+ std::iter_swap(__i++, __first + (__pospos / __swap_range));
+ }
+
+ return;
+ }
+
__distr_type __d;
for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-10-14 19:42 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-01 14:18 [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible Eelis
2016-05-01 14:30 ` Eelis
2016-05-03 12:39 ` Jonathan Wakely
2016-05-03 14:42 ` Eelis van der Weegen
2016-05-25 19:54 ` Eelis
2016-05-25 20:45 ` Eelis
2016-08-31 12:45 ` Jonathan Wakely
2016-09-01 15:14 ` Jonathan Wakely
2016-09-01 15:27 ` Marc Glisse
2016-09-01 15:35 ` Jonathan Wakely
2016-09-01 15:31 ` Eelis van der Weegen
2016-09-01 15:37 ` Jonathan Wakely
2016-09-02 18:20 ` Eelis van der Weegen
2016-09-02 18:53 ` Eelis
2016-09-02 19:27 ` Eelis
2016-10-14 19:42 ` Jonathan Wakely
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).