public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* where are the implementations for <random>?
@ 2017-04-09 17:31 Oliver Kullmann
  2017-04-09 18:24 ` Marc Glisse
  2018-12-21 12:34 ` wrong sha512 values for 7.4.0 ? Oliver Kullmann
  0 siblings, 2 replies; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-09 17:31 UTC (permalink / raw)
  To: gcc-help

Hello,

I thought that with the facilities from <random> we had good and
well-defined ones -- but alas, only recently I realised that only say
the Mersenne twister is well-defined in the standard, but roughly
everything else is up to the implementer.

Since reproducibility is crucial for us (as with most scientific
applications), thus we need to implement ourselves the algorithms for

1. std::seed_seq
2. std::uniform_int_distribution
3. std::bernoulli_distribution (only for default p=0.5).

Easiest would be to copy (and specialise) the gcc-implementations.
I searched for it in the source-code, but couldn't find it:

Where do I find the implementations for <random> ?

Thanks for your help!

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-09 17:31 where are the implementations for <random>? Oliver Kullmann
@ 2017-04-09 18:24 ` Marc Glisse
  2017-04-09 23:32   ` Martin Sebor
  2018-12-21 12:34 ` wrong sha512 values for 7.4.0 ? Oliver Kullmann
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Glisse @ 2017-04-09 18:24 UTC (permalink / raw)
  To: Oliver Kullmann; +Cc: gcc-help

On Sun, 9 Apr 2017, Oliver Kullmann wrote:

> Hello,
>
> I thought that with the facilities from <random> we had good and
> well-defined ones -- but alas, only recently I realised that only say
> the Mersenne twister is well-defined in the standard, but roughly
> everything else is up to the implementer.
>
> Since reproducibility is crucial for us (as with most scientific
> applications), thus we need to implement ourselves the algorithms for
>
> 1. std::seed_seq
> 2. std::uniform_int_distribution
> 3. std::bernoulli_distribution (only for default p=0.5).
>
> Easiest would be to copy (and specialise) the gcc-implementations.
> I searched for it in the source-code, but couldn't find it:
>
> Where do I find the implementations for <random> ?

Option 1)
First you find the file called "random" (/usr/include/c++/6/random here), 
then you notice that it contains
#include <bits/random.h>
#include <bits/opt_random.h>
#include <bits/random.tcc>
so you look for those files (and recursively if needed)

Option 2)
You compile a file that contains #include <random> using the -E flag, and 
look for bernoulli_distribution or whatever in the output.

Option 3)
Get the sources for gcc.
grep -r bernoulli_distribution /the/sources

They all point to bits/random.h (with one function implemented in 
bits/random.tcc), with older versions in tr1/.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-09 18:24 ` Marc Glisse
@ 2017-04-09 23:32   ` Martin Sebor
  2017-04-11 13:03     ` Oliver Kullmann
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Sebor @ 2017-04-09 23:32 UTC (permalink / raw)
  To: gcc-help, Oliver Kullmann

On 04/09/2017 12:23 PM, Marc Glisse wrote:
> On Sun, 9 Apr 2017, Oliver Kullmann wrote:
>
>> Hello,
>>
>> I thought that with the facilities from <random> we had good and
>> well-defined ones -- but alas, only recently I realised that only say
>> the Mersenne twister is well-defined in the standard, but roughly
>> everything else is up to the implementer.
>>
>> Since reproducibility is crucial for us (as with most scientific
>> applications), thus we need to implement ourselves the algorithms for
>>
>> 1. std::seed_seq
>> 2. std::uniform_int_distribution
>> 3. std::bernoulli_distribution (only for default p=0.5).
>>
>> Easiest would be to copy (and specialise) the gcc-implementations.

It's worth keeping in mind that copying libstdc++ code will have
licensing implications.  A different caveat is that conforming C++
programs can define explicit specializations of standard library
templates only for user-defined types.  I.e., it's not valid to
define one's own std::uniform_int_ditribution class template or
specialization of it on a fundamental type such as int.

>> I searched for it in the source-code, but couldn't find it:
>>
>> Where do I find the implementations for <random> ?
>
> Option 1)
> First you find the file called "random" (/usr/include/c++/6/random
> here), then you notice that it contains
> #include <bits/random.h>
> #include <bits/opt_random.h>
> #include <bits/random.tcc>
> so you look for those files (and recursively if needed)
>
> Option 2)
> You compile a file that contains #include <random> using the -E flag,
> and look for bernoulli_distribution or whatever in the output.
>
> Option 3)
> Get the sources for gcc.
> grep -r bernoulli_distribution /the/sources

Another option is to browse the sources online:

https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.h?view=markup

Martin

>
> They all point to bits/random.h (with one function implemented in
> bits/random.tcc), with older versions in tr1/.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-09 23:32   ` Martin Sebor
@ 2017-04-11 13:03     ` Oliver Kullmann
  2017-04-11 18:26       ` Oliver Kullmann
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-11 13:03 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-help

thanks for the info.

My first, little, basically trivially success, is replacing the
Bernoulli-distribution for the default case:

  inline bool bernoulli(std::mt19937_64& g) noexcept {
    typedef std::mt19937_64::result_type uint_t;
    constexpr uint_t half = std::numeric_limits<uint_t>::max() / 2;
    return g() < half;
  }

Checked that it reproduces the behaviour of gcc 4.7.4 and 5.4.

Replacing std::uniform_int_distribution (only for std::mt19937_64) seems harder,
and yet I have problems understanding the code.

One could just generalise the above, and since only 32-bits intervals
are used, that likely would suffice for a decent distribution.

However g++ seems to do something more advanced.

On Sun, Apr 09, 2017 at 05:31:54PM -0600, Martin Sebor wrote:
> It's worth keeping in mind that copying libstdc++ code will have
> licensing implications.  A different caveat is that conforming C++
> programs can define explicit specializations of standard library
> templates only for user-defined types.  I.e., it's not valid to
> define one's own std::uniform_int_ditribution class template or
> specialization of it on a fundamental type such as int.
>

I just want to extract the mathematical content, and then write simple
code for it, like above.

If somebody knows some underlying papers describing the algorithms for
uniform_int_distribution and seeding std::mt19937_64, that would be great.

> Another option is to browse the sources online:
> 
> https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.h?view=markup
>

Thanks, I found that most useful.

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-11 13:03     ` Oliver Kullmann
@ 2017-04-11 18:26       ` Oliver Kullmann
  2017-04-11 20:17         ` Oliver Kullmann
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-11 18:26 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-help

Second success: Uniform integer distribution from 1,...,n:

  typedef std::mt19937_64::result_type uint_t;
  // Returns element of 1,...,n with uniform probability:
  class Uniform {
    std::mt19937_64& g;
    const uint_t n;
    const uint_t scaling;
    const uint_t past;
  public :
    Uniform(std::mt19937_64& g, const uint_t n) : g(g), n(n), scaling(std::numeric_limits<uint_t>::max() / n), past(n * scaling) {}
    gen_uint_t operator ()() {
      gen_uint_t result;
      do result = g(); while (result >= past);
      return result/scaling + 1;
    }
  };

Now only missing seeding for std::mt19937_64.

This seems most complicated.
Apparently it is in
https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.tcc?revision=246542&view=markup

Lines 346 - 389.

I would guess that this functionality is most specialised, and most likely to be different
for different compilers.

Really a pitty that this was not standardised.

Oliver

On Tue, Apr 11, 2017 at 02:02:42PM +0100, Oliver Kullmann wrote:
> thanks for the info.
> 
> My first, little, basically trivially success, is replacing the
> Bernoulli-distribution for the default case:
> 
>   inline bool bernoulli(std::mt19937_64& g) noexcept {
>     typedef std::mt19937_64::result_type uint_t;
>     constexpr uint_t half = std::numeric_limits<uint_t>::max() / 2;
>     return g() < half;
>   }
> 
> Checked that it reproduces the behaviour of gcc 4.7.4 and 5.4.
> 
> Replacing std::uniform_int_distribution (only for std::mt19937_64) seems harder,
> and yet I have problems understanding the code.
> 
> One could just generalise the above, and since only 32-bits intervals
> are used, that likely would suffice for a decent distribution.
> 
> However g++ seems to do something more advanced.
> 
> On Sun, Apr 09, 2017 at 05:31:54PM -0600, Martin Sebor wrote:
> > It's worth keeping in mind that copying libstdc++ code will have
> > licensing implications.  A different caveat is that conforming C++
> > programs can define explicit specializations of standard library
> > templates only for user-defined types.  I.e., it's not valid to
> > define one's own std::uniform_int_ditribution class template or
> > specialization of it on a fundamental type such as int.
> >
> 
> I just want to extract the mathematical content, and then write simple
> code for it, like above.
> 
> If somebody knows some underlying papers describing the algorithms for
> uniform_int_distribution and seeding std::mt19937_64, that would be great.
> 
> > Another option is to browse the sources online:
> > 
> > https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.h?view=markup
> >
> 
> Thanks, I found that most useful.
> 
> Oliver
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-11 18:26       ` Oliver Kullmann
@ 2017-04-11 20:17         ` Oliver Kullmann
  2017-04-11 20:27           ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-11 20:17 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-help

> Now only missing seeding for std::mt19937_64.
> 
> This seems most complicated.
> Apparently it is in
> https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.tcc?revision=246542&view=markup
> 
> Lines 346 - 389.
> 
> I would guess that this functionality is most specialised, and most likely to be different
> for different compilers.
> 
> Really a pitty that this was not standardised.
> 

The code at Lines 346 - 389 for the seed-member of the
mersenne_twister_engine doesn't look too bad except of two aspects: What
is

 - __detail::_Shift ?
 - __detail::__mod  ?

It seems that apart from these functions, everything else is just
arithmetic of bit-logic.

And the creation of that sequence in Line 360:
  __q.generate(__arr + 0, __arr + __n * __k);
actually seems to be defined by the standard, so that can be used.

ACTUALLY, I just realise that even if I can code the above myself, I
couldn't use the Mersenne twister from the standard library, since
apparently the only way of setting the internal state is by
*constants*??

All what there is seems to be
http://www.cplusplus.com/reference/random/mersenne_twister_engine/
template <class UIntType, size_t w, size_t n, size_t m, size_t r,
          UIntType a, size_t u, UIntType d, size_t s,
          UIntType b, size_t t,
          UIntType c, size_t l, UIntType f>
  class mersenne_twister_engine;
and that has the state, which is set by the seed-member-function, only as template parameter.

So I need to implement the Mersenne twister apparently myself.
Or perhaps I don't understand that yet properly.

Perhaps the seed function has nothing to do with those template parameters, which are the
parameters of the engine, while the seed sets only the "internal state", and that perhaps
can be done explicitly by calling some std-library facility?

Would be thankfor for any help.

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-11 20:17         ` Oliver Kullmann
@ 2017-04-11 20:27           ` Jonathan Wakely
  2017-04-12  3:41             ` Oliver Kullmann
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2017-04-11 20:27 UTC (permalink / raw)
  To: Oliver Kullmann; +Cc: Martin Sebor, gcc-help

On 11 April 2017 at 21:17, Oliver Kullmann <o.kullmann@swansea.ac.uk> wrote:
>> Now only missing seeding for std::mt19937_64.
>>
>> This seems most complicated.
>> Apparently it is in
>> https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/include/bits/random.tcc?revision=246542&view=markup
>>
>> Lines 346 - 389.
>>
>> I would guess that this functionality is most specialised, and most likely to be different
>> for different compilers.
>>
>> Really a pitty that this was not standardised.
>>
>
> The code at Lines 346 - 389 for the seed-member of the
> mersenne_twister_engine doesn't look too bad except of two aspects: What
> is
>
>  - __detail::_Shift ?
>  - __detail::__mod  ?
>
> It seems that apart from these functions, everything else is just
> arithmetic of bit-logic.
>
> And the creation of that sequence in Line 360:
>   __q.generate(__arr + 0, __arr + __n * __k);
> actually seems to be defined by the standard, so that can be used.
>
> ACTUALLY, I just realise that even if I can code the above myself, I
> couldn't use the Mersenne twister from the standard library, since
> apparently the only way of setting the internal state is by
> *constants*??
>
> All what there is seems to be
> http://www.cplusplus.com/reference/random/mersenne_twister_engine/

Use a better reference. http://en.cppreference.com is much better than
cplusplus.com

> template <class UIntType, size_t w, size_t n, size_t m, size_t r,
>           UIntType a, size_t u, UIntType d, size_t s,
>           UIntType b, size_t t,
>           UIntType c, size_t l, UIntType f>
>   class mersenne_twister_engine;
> and that has the state, which is set by the seed-member-function, only as template parameter.
>
> So I need to implement the Mersenne twister apparently myself.
> Or perhaps I don't understand that yet properly.
>
> Perhaps the seed function has nothing to do with those template parameters, which are the
> parameters of the engine, while the seed sets only the "internal state", and that perhaps
> can be done explicitly by calling some std-library facility?

I don't understand what you're trying to do or what the difficulty is.

The output of the mersenne_twister_engine is specified by the
standard, there should be no need to reimplement it. If you don't get
repeatable values for any implementation of mersenne_twister_engine
then it's a bug in that implementation.

The template arguments are just parameters for the internal algorithm,
so different parameters create different outputs. std::mt19937 is
simply a specialization of the mersenne_twister_engine with parameters
that are known to produce good pseudo-random numbers.

The behaviour of uniform_int_distribution is also specified by the
standard, but only the mathematical properties, not a specific
implementation. That means different implementations can give
different outputs. If you use your own uniform_int_distribution then
you ensure repeatable outputs across implementations. I repeat, there
should be no need to do that with the engines themselves.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-11 20:27           ` Jonathan Wakely
@ 2017-04-12  3:41             ` Oliver Kullmann
  2017-04-12  5:01               ` Oliver Kullmann
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-12  3:41 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Martin Sebor, gcc-help

On Tue, Apr 11, 2017 at 09:27:53PM +0100, Jonathan Wakely wrote:
> > All what there is seems to be
> > http://www.cplusplus.com/reference/random/mersenne_twister_engine/
> 
> Use a better reference. http://en.cppreference.com is much better than
> cplusplus.com
>

Actually I also noted that some time ago (the above was just what came
out first from the search engine (unfortunately)).

> I don't understand what you're trying to do or what the difficulty is.
> 
> The output of the mersenne_twister_engine is specified by the
> standard, there should be no need to reimplement it. If you don't get
> repeatable values for any implementation of mersenne_twister_engine
> then it's a bug in that implementation.
> 
> The template arguments are just parameters for the internal algorithm,
> so different parameters create different outputs. std::mt19937 is
> simply a specialization of the mersenne_twister_engine with parameters
> that are known to produce good pseudo-random numbers.
> 
> The behaviour of uniform_int_distribution is also specified by the
> standard, but only the mathematical properties, not a specific
> implementation. That means different implementations can give
> different outputs. If you use your own uniform_int_distribution then
> you ensure repeatable outputs across implementations. I repeat, there
> should be no need to do that with the engines themselves.

Consider the following program MT.cpp:

#include <iostream>
#include <random>
#include <vector>
#include <cstdint>

int main() {
  typedef std::mt19937_64::result_type ui;
  std::seed_seq s {ui(10),ui(10),ui(0)};
  std::vector<std::uint32_t> seeds(3);
  s.generate(seeds.begin(), seeds.end());
  for (const auto x : seeds) std::cout << x << " ";
  std::cout << "\n";
  std::mt19937_64 g(s);
  std::cout << g() << " " << g() << "\n";
}

Compiled with
> g++ --std=c++11 -Wall MT.cpp 
it yields with gcc 4.7.4 and gcc 5.4

> ./a.out 
927367489 2598207009 681269367 
14538134740155241067 15440504459514664889

As far as I understand it, only the first line is defined by the standard,
but not the second: how the seed-sequence is used by the generator g is up
to the compiler.

Thus the need to do the seeding ourselves.

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: where are the implementations for <random>?
  2017-04-12  3:41             ` Oliver Kullmann
@ 2017-04-12  5:01               ` Oliver Kullmann
  0 siblings, 0 replies; 10+ messages in thread
From: Oliver Kullmann @ 2017-04-12  5:01 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Martin Sebor, gcc-help

I found

  http://eel.is/c++draft/rand.eng.mers

which, if it has been adopted by the Standard, seems in Points 6, 8 actually
to fully specify the seeding, from initial values as well as from
sequents (std::seed_seq), and thus, hopefully, actually seeding in both
forms seems well-defined by the standard. (Otherwise it would be a
strange oversight.)

Oliver

P.S. And it seems that the GCC implementation does what is described in
that document.


> Consider the following program MT.cpp:
> 
> #include <iostream>
> #include <random>
> #include <vector>
> #include <cstdint>
> 
> int main() {
>   typedef std::mt19937_64::result_type ui;
>   std::seed_seq s {ui(10),ui(10),ui(0)};
>   std::vector<std::uint32_t> seeds(3);
>   s.generate(seeds.begin(), seeds.end());
>   for (const auto x : seeds) std::cout << x << " ";
>   std::cout << "\n";
>   std::mt19937_64 g(s);
>   std::cout << g() << " " << g() << "\n";
> }
> 
> Compiled with
> > g++ --std=c++11 -Wall MT.cpp 
> it yields with gcc 4.7.4 and gcc 5.4
> 
> > ./a.out 
> 927367489 2598207009 681269367 
> 14538134740155241067 15440504459514664889
> 
> As far as I understand it, only the first line is defined by the standard,
> but not the second: how the seed-sequence is used by the generator g is up
> to the compiler.
> 
> Thus the need to do the seeding ourselves.
> 
> Oliver
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* wrong sha512 values for 7.4.0 ?
  2017-04-09 17:31 where are the implementations for <random>? Oliver Kullmann
  2017-04-09 18:24 ` Marc Glisse
@ 2018-12-21 12:34 ` Oliver Kullmann
  1 sibling, 0 replies; 10+ messages in thread
From: Oliver Kullmann @ 2018-12-21 12:34 UTC (permalink / raw)
  To: gcc-help

Hello,

after download from various mirror sites I get

  > sha512sum gcc-7.4.0.tar.gz 
  6824b5c8fdb3151d8dd517911d3d975f7808525f52db32b5c25e9354b562792d6d2f1e8cc5aa019ff250df65b4f29b43f65ab6d769a070fd0015b13a3a9d6bf9
  gcc-7.4.0.tar.gz
 
however e.g.

  ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-7.4.0/sha512.sum

states:

  a86d02c2c880f46dbed04f5da8d18d9555f980c9e3b867770b2f091cf8fdba55fea34962128bd1d1e11e3694ec188d0284963c1df5d0e4a51c2da7b56cde3625
  gcc-7.4.0.tar.gz

??

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-20 18:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-09 17:31 where are the implementations for <random>? Oliver Kullmann
2017-04-09 18:24 ` Marc Glisse
2017-04-09 23:32   ` Martin Sebor
2017-04-11 13:03     ` Oliver Kullmann
2017-04-11 18:26       ` Oliver Kullmann
2017-04-11 20:17         ` Oliver Kullmann
2017-04-11 20:27           ` Jonathan Wakely
2017-04-12  3:41             ` Oliver Kullmann
2017-04-12  5:01               ` Oliver Kullmann
2018-12-21 12:34 ` wrong sha512 values for 7.4.0 ? Oliver Kullmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).