From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 109487 invoked by alias); 1 Sep 2016 15:31:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 108876 invoked by uid 89); 1 Sep 2016 15:31:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=(unknown), H*MI:sk:2016090, meet, H*r:esmtpa X-Spam-User: qpsmtpd, 2 recipients X-HELO: smtpq4.tb.mail.iss.as9143.net Received: from smtpq4.tb.mail.iss.as9143.net (HELO smtpq4.tb.mail.iss.as9143.net) (212.54.42.167) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 01 Sep 2016 15:31:42 +0000 Received: from [212.54.42.117] (helo=lsmtp3.tb.mail.iss.as9143.net) by smtpq4.tb.mail.iss.as9143.net with esmtp (Exim 4.82) (envelope-from ) id 1bfTxr-0005fP-K0; Thu, 01 Sep 2016 17:31:39 +0200 Received: from i23017.upc-i.chello.nl ([62.195.23.17] helo=[192.168.0.16]) by lsmtp3.tb.mail.iss.as9143.net with esmtpa (Exim 4.82) (envelope-from ) id 1bfTxr-0001iJ-Gz; Thu, 01 Sep 2016 17:31:39 +0200 Subject: Re: [patch, libstdc++] std::shuffle: Generate two swap positions at a time if possible To: Jonathan Wakely References: <5728B8CC.7030405@eelis.net> <20160831124502.GH3342@redhat.com> <20160901151446.GP3342@redhat.com> Cc: libstdc++ , gcc-patches From: Eelis van der Weegen Message-ID: <57C849D6.8090508@eelis.net> Date: Thu, 01 Sep 2016 15:31:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160901151446.GP3342@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Ziggo-spambar: / X-Ziggo-spamscore: 0.0 X-Ziggo-spamreport: CMAE Analysis: v=2.1 cv=JtMM15MC c=1 sm=0 tr=0 a=9+rZDBEiDlHhcck0kWbJtElFXBc=:19 a=N659UExz7-8A:10 a=GW1xBdLrtEIA:10 a=af5Vv7xOAAAA:8 a=iSIRHA6BAAAA:8 a=_RazfL_mrlyazeFuR5kA:9 a=s1iOnOrNGoRHd4_z:21 a=4C6sm_I_CGO12z5O:21 a=pILNOxqGKmIA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 a=aUhdSXYnlrUo_HJWavH_:22 a=k9eN9M0LngA2N9Id7KX9:22 xcat=Undefined/Undefined none X-Ziggo-Spam-Status: No X-SW-Source: 2016-09/txt/msg00053.txt.bz2 On 2016-09-01 17:14, Jonathan Wakely wrote: > On 31/08/16 13:45 +0100, Jonathan Wakely wrote: >> On 03/05/16 16:42 +0200, Eelis van der Weegen wrote: >>> Ah, thanks, I forgot to re-attach when I sent to include the libstdc++ list. >>> >>> On 2016-05-03 14:38, Jonathan Wakely wrote: >>>> ENOPATCH >>>> >>>> On 1 May 2016 at 15:21, Eelis wrote: >>>>> Sorry, forgot to include the libstdc++ list. >>>>> >>>>> On 2016-05-01 16:18, Eelis wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> The attached patch optimizes std::shuffle for the very common case >>>>>> where the generator range is large enough that a single invocation >>>>>> can produce two swap positions. >>>>>> >>>>>> This reduces the runtime of the following testcase by 37% on my machine: >>>>>> >>>>>> int main() >>>>>> { >>>>>> std::mt19937 gen; >>>>>> >>>>>> std::vector v; >>>>>> v.reserve(10000); >>>>>> for (int i = 0; i != 10000; ++i) >>>>>> { >>>>>> v.push_back(i); >>>>>> std::shuffle(v.begin(), v.end(), gen); >>>>>> } >>>>>> >>>>>> std::cout << v.front() << '\n'; >>>>>> } >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Eelis >>>>>> >>>>> >>>>> >>> >> >>> Index: libstdc++-v3/include/bits/stl_algo.h >>> =================================================================== >>> --- libstdc++-v3/include/bits/stl_algo.h (revision 235680) >>> +++ libstdc++-v3/include/bits/stl_algo.h (working copy) >>> @@ -3708,6 +3708,22 @@ >>> #endif >>> >>> #ifdef _GLIBCXX_USE_C99_STDINT_TR1 >>> + >>> + template >> >> We should avoid introducing new names based on "uniform random number >> generator" and use _UniformRandomBitGenerator as per >> https://wg21.link/p0346r1 >> >>> + inline _IntType >>> + __generate_random_index_below(_IntType __bound, _UniformRandomNumberGenerator& __g) >>> + { >>> + const _IntType __urngrange = __g.max() - __g.min() + 1; >> >> Similarly, let's use __urbgrange here. >> >>> + const _IntType __scaling = __urngrange / __bound; >> >> I think I'd like either a comment on the function documenting the >> assumption about __bound and __g, or an explicit check: >> >> __glibcxx_assert( __scaling >= __bound ); >> >>> + const _IntType __past = __bound * __scaling; >>> + >>> + for (;;) >>> + { >>> + const _IntType __r = _IntType(__g()) - __g.min(); >>> + if (__r < __past) return __r / __scaling; >> >> This is basically the same algorithm as uniform_int_distribution so >> doesn't introduce any bias, right? >> >> Is this significantly faster than just using >> uniform_int_distribution<_IntType>{0, __bound - 1}(__g) so we don't >> need to duplicate the logic? (And people maintaining the code won't >> reconvince themselves it's correct every time they look at it :-) >> >> >>> + } >>> + } >>> + >>> /** >>> * @brief Shuffle the elements of a sequence using a uniform random >>> * number generator. >>> @@ -3740,6 +3756,40 @@ >>> typedef typename std::make_unsigned<_DistanceType>::type __ud_type; >>> typedef typename std::uniform_int_distribution<__ud_type> __distr_type; >>> typedef typename __distr_type::param_type __p_type; >>> + >>> + typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen; >>> + typedef typename std::common_type::type __uc_type; >>> + >>> + const __uc_type __urngrange = _Gen::max() - _Gen::min() + 1; >>> + const __uc_type __urange = __uc_type(__last - __first); >>> + >>> + if (__urngrange / __urange >= __urange) >>> + // I.e. (__urngrange >= __urange * __urange) but without wrap issues. >>> + { >>> + for (_RandomAccessIterator __i = __first + 1; __i != __last; ) >>> + { >>> + const __uc_type __swap_range = __uc_type(__i - __first) + 1; >>> + >>> + if (__i + 1 == __last) >> >> Could we hoist this test out of the loop somehow? >> >> If we change the loop condition to be __i+1 < __last we don't need to >> test it on every iteration, and then after the loop we can just do >> the final swap if (__urange % 2). >> >>> + { >>> + const __uc_type __pos = __generate_random_index_below(__swap_range, __g); >>> + std::iter_swap(__i, __first + __pos); >>> + return; >>> + } >>> + >>> + // Use a single generator invocation to produce swap positions for >>> + // both of the next two elements: >>> + >>> + const __uc_type __comp_range = __swap_range * (__swap_range + 1); >>> + const __uc_type __pospos = __generate_random_index_below(__comp_range, __g); >>> + >>> + std::iter_swap(__i++, __first + (__pospos % __swap_range)); >>> + std::iter_swap(__i++, __first + (__pospos / __swap_range)); >> >> I think I've convinced myself this is correct :-) >> >> Values of __pospos will be uniformly distributed in [0, __comp_range) > > iThis is true, but ... > >> and so the / and % results will be too. > > This isn't. > > If __swap_range is 3, then __comp_range is 10 and > __pospos is uniformly distributed in [0, 9]. > > (__pospos % __swap_range) is not uniformly distributed, we get > P(0) = 0.4, P(1) = 0.3, P(2) = 0.3. > > Similarly, (__pospos / __swap_range) is not uniform, we get > P(0) = 0.3, P(1) = 0.3, P(2) = 0.3, P(3) = 0.1 > > This means that certain permuations of the input are more likely than > others, which fails to meet the requirements of the function. > > Or is there a flaw in my reasoning? Just that if __swap_range is 3, then __comp_range is 3*(3+1)=12, not 10. :) Thanks for the review! I'll send an updated patch addressing the other issues soon.