From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-435090-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 109487 invoked by alias); 1 Sep 2016 15:31:53 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 108876 invoked by uid 89); 1 Sep 2016 15:31:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=(unknown), H*MI:sk:2016090, meet, H*r:esmtpa
X-Spam-User: qpsmtpd, 2 recipients
X-HELO: smtpq4.tb.mail.iss.as9143.net
Received: from smtpq4.tb.mail.iss.as9143.net (HELO smtpq4.tb.mail.iss.as9143.net) (212.54.42.167) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 01 Sep 2016 15:31:42 +0000
Received: from [212.54.42.117] (helo=lsmtp3.tb.mail.iss.as9143.net)	by smtpq4.tb.mail.iss.as9143.net with esmtp (Exim 4.82)	(envelope-from <eelis@eelis.net>)	id 1bfTxr-0005fP-K0; Thu, 01 Sep 2016 17:31:39 +0200
Received: from i23017.upc-i.chello.nl ([62.195.23.17] helo=[192.168.0.16])	by lsmtp3.tb.mail.iss.as9143.net with esmtpa (Exim 4.82)	(envelope-from <eelis@eelis.net>)	id 1bfTxr-0001iJ-Gz; Thu, 01 Sep 2016 17:31:39 +0200
Subject: Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible
To: Jonathan Wakely <jwakely@redhat.com>
References: <ng536o$4iu$1@ger.gmane.org> <ng53cl$4iu$2@ger.gmane.org> <CAH6eHdTnASg7A5USA_sTAHXPfUFmkrZvF27tZUJhj5ywwDVDag@mail.gmail.com> <5728B8CC.7030405@eelis.net> <20160831124502.GH3342@redhat.com> <20160901151446.GP3342@redhat.com>
Cc: libstdc++ <libstdc++@gcc.gnu.org>, gcc-patches <gcc-patches@gcc.gnu.org>
From: Eelis van der Weegen <eelis@eelis.net>
Message-ID: <57C849D6.8090508@eelis.net>
Date: Thu, 01 Sep 2016 15:31:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0
MIME-Version: 1.0
In-Reply-To: <20160901151446.GP3342@redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Ziggo-spambar: /
X-Ziggo-spamscore: 0.0
X-Ziggo-spamreport: CMAE Analysis: v=2.1 cv=JtMM15MC c=1 sm=0 tr=0 a=9+rZDBEiDlHhcck0kWbJtElFXBc=:19 a=N659UExz7-8A:10 a=GW1xBdLrtEIA:10 a=af5Vv7xOAAAA:8 a=iSIRHA6BAAAA:8 a=_RazfL_mrlyazeFuR5kA:9 a=s1iOnOrNGoRHd4_z:21 a=4C6sm_I_CGO12z5O:21 a=pILNOxqGKmIA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 a=aUhdSXYnlrUo_HJWavH_:22 a=k9eN9M0LngA2N9Id7KX9:22 xcat=Undefined/Undefined none
X-Ziggo-Spam-Status: No
X-SW-Source: 2016-09/txt/msg00053.txt.bz2

On 2016-09-01 17:14, Jonathan Wakely wrote:
> On 31/08/16 13:45 +0100, Jonathan Wakely wrote:
>> On 03/05/16 16:42 +0200, Eelis van der Weegen wrote:
>>> Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list.
>>>
>>> On 2016-05-03 14:38, Jonathan Wakely wrote:
>>>> ENOPATCH
>>>>
>>>> On 1 May 2016 at 15:21, Eelis <eelis@eelis.net> wrote:
>>>>> Sorry, forgot to include the libstdc++ list.
>>>>>
>>>>> On 2016-05-01 16:18, Eelis wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The attached patch optimizes std::shuffle for the very common case
>>>>>> where the generator range is large enough that a single invocation
>>>>>> can produce two swap positions.
>>>>>>
>>>>>> This reduces the runtime of the following testcase by 37% on my machine:
>>>>>>
>>>>>>     int main()
>>>>>>     {
>>>>>>         std::mt19937 gen;
>>>>>>
>>>>>>         std::vector<int> v;
>>>>>>         v.reserve(10000);
>>>>>>         for (int i = 0; i != 10000; ++i)
>>>>>>         {
>>>>>>             v.push_back(i);
>>>>>>             std::shuffle(v.begin(), v.end(), gen);
>>>>>>         }
>>>>>>
>>>>>>         std::cout << v.front() << '\n';
>>>>>>     }
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eelis
>>>>>>
>>>>>
>>>>>
>>>
>>
>>> Index: libstdc++-v3/include/bits/stl_algo.h
>>> ===================================================================
>>> --- libstdc++-v3/include/bits/stl_algo.h    (revision 235680)
>>> +++ libstdc++-v3/include/bits/stl_algo.h    (working copy)
>>> @@ -3708,6 +3708,22 @@
>>> #endif
>>>
>>> #ifdef _GLIBCXX_USE_C99_STDINT_TR1
>>> +
>>> +  template<typename _IntType, typename _UniformRandomNumberGenerator>
>>
>> We should avoid introducing new names based on "uniform random number
>> generator" and use _UniformRandomBitGenerator as per
>> https://wg21.link/p0346r1
>>
>>> +    inline _IntType
>>> +    __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g)
>>> +    {
>>> +      const _IntType __urngrange = __g.max() - __g.min() + 1;
>>
>> Similarly, let's use __urbgrange here.
>>
>>> +      const _IntType __scaling = __urngrange / __bound;
>>
>> I think I'd like either a comment on the function documenting the
>> assumption about __bound and __g, or an explicit check:
>>
>>       __glibcxx_assert( __scaling >= __bound );
>>
>>> +      const _IntType __past = __bound * __scaling;
>>> +
>>> +      for (;;)
>>> +      {
>>> +    const _IntType __r = _IntType(__g()) - __g.min();
>>> +    if (__r < __past) return __r / __scaling;
>>
>> This is basically the same algorithm as uniform_int_distribution so
>> doesn't introduce any bias, right?
>>
>> Is this significantly faster than just using
>> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't
>> need to duplicate the logic? (And people maintaining the code won't
>> reconvince themselves it's correct every time they look at it :-)
>>
>>
>>> +      }
>>> +    }
>>> +
>>>  /**
>>>   *  @brief Shuffle the elements of a sequence using a uniform random
>>>   *         number generator.
>>> @@ -3740,6 +3756,40 @@
>>>      typedef typename std::make_unsigned<_DistanceType>::type __ud_type;
>>>      typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
>>>      typedef typename __distr_type::param_type __p_type;
>>> +
>>> +      typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
>>> +      typedef typename std::common_type<typename _Gen::result_type, __ud_type>::type __uc_type;
>>> +
>>> +      const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1;
>>> +      const __uc_type __urange = __uc_type(__last - __first);
>>> +
>>> +      if (__urngrange / __urange >= __urange)
>>> +        // I.e. (__urngrange >= __urange * __urange) but without wrap issues.
>>> +      {
>>> +    for (_RandomAccessIterator __i = __first + 1; __i != __last; )
>>> +    {
>>> +      const __uc_type __swap_range = __uc_type(__i - __first) + 1;
>>> +
>>> +      if (__i + 1 == __last)
>>
>> Could we hoist this test out of the loop somehow?
>>
>> If we change the loop condition to be __i+1 < __last we don't need to
>> test it on every iteration, and then after the loop we can just do
>> the final swap if (__urange % 2).
>>
>>> +      {
>>> +        const __uc_type __pos = __generate_random_index_below(__swap_range, __g);
>>> +        std::iter_swap(__i, __first + __pos);
>>> +        return;
>>> +      }
>>> +
>>> +      // Use a single generator invocation to produce swap positions for
>>> +      // both of the next two elements:
>>> +
>>> +      const __uc_type __comp_range = __swap_range * (__swap_range + 1);
>>> +      const __uc_type __pospos = __generate_random_index_below(__comp_range, __g);
>>> +
>>> +      std::iter_swap(__i++, __first + (__pospos % __swap_range));
>>> +      std::iter_swap(__i++, __first + (__pospos / __swap_range));
>>
>> I think I've convinced myself this is correct :-)
>>
>> Values of __pospos will be uniformly distributed in [0, __comp_range)
>
> iThis is true, but ...
>
>> and so the / and % results will be too.
>
> This isn't.
>
> If __swap_range is 3, then __comp_range is 10 and
> __pospos is uniformly distributed in [0, 9].
>
> (__pospos % __swap_range) is not uniformly distributed, we get
> P(0) = 0.4, P(1) = 0.3, P(2) = 0.3.
>
> Similarly, (__pospos / __swap_range) is not uniform, we get
> P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1
>
> This means that certain permuations of the input are more likely than
> others, which fails to meet the requirements of the function.
>
> Or is there a flaw in my reasoning?

Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :)

Thanks for the review! I'll send an updated patch addressing the other issues soon.