* [Patch, libfortran] Multi-threaded random_number @ 2016-07-24 13:45 Janne Blomqvist 2016-08-01 10:42 ` Janne Blomqvist 2016-08-08 11:01 ` Janne Blomqvist 0 siblings, 2 replies; 12+ messages in thread From: Janne Blomqvist @ 2016-07-24 13:45 UTC (permalink / raw) To: Fortran List, GCC Patches [-- Attachment #1: Type: text/plain, Size: 2264 bytes --] Hi, the attached patch replaces the current random_number / random_seed implementations with an implementation that better supports threads. It's an improved version of the RFC patch I posted earlier at https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see that earlier message for a longer-winded explanation of what's wrong with the current implementation and how the patch addresses this. In short, with the patch the random number generator state is now per-thread and stored in a per-thread (TLS) variable, enabling a lockless fast-path. This provides up to 2 orders of magnitude better performance on a synthetic benchmark using 4 threads, and provides a more deterministic result as the order that threads are scheduled does not affect the random number streams for each thread. Compared to the RFC patch, a number of minor and not-so-minor bugs have been fixed, so the patch now passes the testsuite (with a few modifications to the suite, part of the patch). Also, for REAL kinds 4, 8, 10 the generated streams are identical (except precision, of course) (like the current implementation), enabling precision comparisons, as requested by Steve Kargl. However, this does not extend to REAL(16) as that would have necessitated doubling the size of the state, along with potential issues of slower escape from a low-entropy state, for a feature that I believe is not used by particularly many users in the end. So if one wants to do precision comparisons with REAL(16) one must employ a wrapper routine. Regtested on x86_64-pc-linux-gnu, Ok for trunk? frontend ChangeLog: 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> * check.c (gfc_check_random_seed): Use new seed size in check. * intrinsic.texi (RANDOM_NUMBER): Updated documentation. (RANDOM_SEED): Likewise. testsuite: 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> * gfortran.dg/random_7.f90: Take into account that the last seed value is the special p value. * gfortran.dg/random_seed_1.f90: Seed size is now constant. libgfortran: 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> * intrinsics/random.c: Replace KISS with xorshift1024* with per-thread (TLS) state. * runtime/main.c (init): Don't call random_seed_i4. -- Janne Blomqvist [-- Attachment #2: random_threads.diff.gz --] [-- Type: application/x-gzip, Size: 10253 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist @ 2016-08-01 10:42 ` Janne Blomqvist 2016-08-01 10:48 ` Andreas Schwab 2016-08-08 11:01 ` Janne Blomqvist 1 sibling, 1 reply; 12+ messages in thread From: Janne Blomqvist @ 2016-08-01 10:42 UTC (permalink / raw) To: Fortran List, GCC Patches PING? Also, a minor bugfix on top of the previous patch: diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c index 9b54a02..21395ea 100644 --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) init_rand_state (true); /* Copy p & 15 */ - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; + rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15; } @@ -862,7 +862,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get) njumps = 0; init_rand_state (true); - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; + rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15; } On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > Hi, > > the attached patch replaces the current random_number / random_seed > implementations with an implementation that better supports threads. > It's an improved version of the RFC patch I posted earlier at > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see > that earlier message for a longer-winded explanation of what's wrong > with the current implementation and how the patch addresses this. > > In short, with the patch the random number generator state is now > per-thread and stored in a per-thread (TLS) variable, enabling a > lockless fast-path. This provides up to 2 orders of magnitude better > performance on a synthetic benchmark using 4 threads, and provides a > more deterministic result as the order that threads are scheduled does > not affect the random number streams for each thread. > > Compared to the RFC patch, a number of minor and not-so-minor bugs > have been fixed, so the patch now passes the testsuite (with a few > modifications to the suite, part of the patch). Also, for REAL kinds > 4, 8, 10 the generated streams are identical (except precision, of > course) (like the current implementation), enabling precision > comparisons, as requested by Steve Kargl. However, this does not > extend to REAL(16) as that would have necessitated doubling the size > of the state, along with potential issues of slower escape from a > low-entropy state, for a feature that I believe is not used by > particularly many users in the end. So if one wants to do precision > comparisons with REAL(16) one must employ a wrapper routine. > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > frontend ChangeLog: > > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * check.c (gfc_check_random_seed): Use new seed size in check. > * intrinsic.texi (RANDOM_NUMBER): Updated documentation. > (RANDOM_SEED): Likewise. > > > testsuite: > > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * gfortran.dg/random_7.f90: Take into account that the last seed > value is the special p value. > * gfortran.dg/random_seed_1.f90: Seed size is now constant. > > > libgfortran: > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * intrinsics/random.c: Replace KISS with xorshift1024* with > per-thread (TLS) state. > * runtime/main.c (init): Don't call random_seed_i4. > > > -- > Janne Blomqvist -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-01 10:42 ` Janne Blomqvist @ 2016-08-01 10:48 ` Andreas Schwab 2016-08-01 11:00 ` Janne Blomqvist 0 siblings, 1 reply; 12+ messages in thread From: Andreas Schwab @ 2016-08-01 10:48 UTC (permalink / raw) To: Janne Blomqvist; +Cc: Fortran List, GCC Patches On Mo, Aug 01 2016, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > PING? > > Also, a minor bugfix on top of the previous patch: > > diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c > index 9b54a02..21395ea 100644 > --- a/libgfortran/intrinsics/random.c > +++ b/libgfortran/intrinsics/random.c > @@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 > *put, gfc_array_i4 *get) > init_rand_state (true); > > /* Copy p & 15 */ > - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; > + rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15; The comment no longer fits. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-01 10:48 ` Andreas Schwab @ 2016-08-01 11:00 ` Janne Blomqvist 0 siblings, 0 replies; 12+ messages in thread From: Janne Blomqvist @ 2016-08-01 11:00 UTC (permalink / raw) To: Andreas Schwab; +Cc: Fortran List, GCC Patches On Mon, Aug 1, 2016 at 1:48 PM, Andreas Schwab <schwab@suse.de> wrote: > On Mo, Aug 01 2016, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > >> PING? >> >> Also, a minor bugfix on top of the previous patch: >> >> diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c >> index 9b54a02..21395ea 100644 >> --- a/libgfortran/intrinsics/random.c >> +++ b/libgfortran/intrinsics/random.c >> @@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 >> *put, gfc_array_i4 *get) >> init_rand_state (true); >> >> /* Copy p & 15 */ >> - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; >> + rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15; > > The comment no longer fits. Indeed, thanks. I've fixed it in my own tree. -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist 2016-08-01 10:42 ` Janne Blomqvist @ 2016-08-08 11:01 ` Janne Blomqvist 2016-08-09 5:01 ` Jerry DeLisle 1 sibling, 1 reply; 12+ messages in thread From: Janne Blomqvist @ 2016-08-08 11:01 UTC (permalink / raw) To: Fortran List, GCC Patches PING**2 On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > Hi, > > the attached patch replaces the current random_number / random_seed > implementations with an implementation that better supports threads. > It's an improved version of the RFC patch I posted earlier at > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see > that earlier message for a longer-winded explanation of what's wrong > with the current implementation and how the patch addresses this. > > In short, with the patch the random number generator state is now > per-thread and stored in a per-thread (TLS) variable, enabling a > lockless fast-path. This provides up to 2 orders of magnitude better > performance on a synthetic benchmark using 4 threads, and provides a > more deterministic result as the order that threads are scheduled does > not affect the random number streams for each thread. > > Compared to the RFC patch, a number of minor and not-so-minor bugs > have been fixed, so the patch now passes the testsuite (with a few > modifications to the suite, part of the patch). Also, for REAL kinds > 4, 8, 10 the generated streams are identical (except precision, of > course) (like the current implementation), enabling precision > comparisons, as requested by Steve Kargl. However, this does not > extend to REAL(16) as that would have necessitated doubling the size > of the state, along with potential issues of slower escape from a > low-entropy state, for a feature that I believe is not used by > particularly many users in the end. So if one wants to do precision > comparisons with REAL(16) one must employ a wrapper routine. > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > frontend ChangeLog: > > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * check.c (gfc_check_random_seed): Use new seed size in check. > * intrinsic.texi (RANDOM_NUMBER): Updated documentation. > (RANDOM_SEED): Likewise. > > > testsuite: > > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * gfortran.dg/random_7.f90: Take into account that the last seed > value is the special p value. > * gfortran.dg/random_seed_1.f90: Seed size is now constant. > > > libgfortran: > 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> > > * intrinsics/random.c: Replace KISS with xorshift1024* with > per-thread (TLS) state. > * runtime/main.c (init): Don't call random_seed_i4. > > > -- > Janne Blomqvist -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-08 11:01 ` Janne Blomqvist @ 2016-08-09 5:01 ` Jerry DeLisle 2016-08-10 11:12 ` Janne Blomqvist 0 siblings, 1 reply; 12+ messages in thread From: Jerry DeLisle @ 2016-08-09 5:01 UTC (permalink / raw) To: Janne Blomqvist, Fortran List, GCC Patches On 08/08/2016 04:01 AM, Janne Blomqvist wrote: > PING**2 > OK, thanks for patch. Jerry > On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist > <blomqvist.janne@gmail.com> wrote: >> Hi, >> >> the attached patch replaces the current random_number / random_seed >> implementations with an implementation that better supports threads. >> It's an improved version of the RFC patch I posted earlier at >> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see >> that earlier message for a longer-winded explanation of what's wrong >> with the current implementation and how the patch addresses this. >> >> In short, with the patch the random number generator state is now >> per-thread and stored in a per-thread (TLS) variable, enabling a >> lockless fast-path. This provides up to 2 orders of magnitude better >> performance on a synthetic benchmark using 4 threads, and provides a >> more deterministic result as the order that threads are scheduled does >> not affect the random number streams for each thread. >> >> Compared to the RFC patch, a number of minor and not-so-minor bugs >> have been fixed, so the patch now passes the testsuite (with a few >> modifications to the suite, part of the patch). Also, for REAL kinds >> 4, 8, 10 the generated streams are identical (except precision, of >> course) (like the current implementation), enabling precision >> comparisons, as requested by Steve Kargl. However, this does not >> extend to REAL(16) as that would have necessitated doubling the size >> of the state, along with potential issues of slower escape from a >> low-entropy state, for a feature that I believe is not used by >> particularly many users in the end. So if one wants to do precision >> comparisons with REAL(16) one must employ a wrapper routine. >> >> Regtested on x86_64-pc-linux-gnu, Ok for trunk? >> >> frontend ChangeLog: >> >> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >> >> * check.c (gfc_check_random_seed): Use new seed size in check. >> * intrinsic.texi (RANDOM_NUMBER): Updated documentation. >> (RANDOM_SEED): Likewise. >> >> >> testsuite: >> >> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >> >> * gfortran.dg/random_7.f90: Take into account that the last seed >> value is the special p value. >> * gfortran.dg/random_seed_1.f90: Seed size is now constant. >> >> >> libgfortran: >> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >> >> * intrinsics/random.c: Replace KISS with xorshift1024* with >> per-thread (TLS) state. >> * runtime/main.c (init): Don't call random_seed_i4. >> >> >> -- >> Janne Blomqvist > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-09 5:01 ` Jerry DeLisle @ 2016-08-10 11:12 ` Janne Blomqvist 2016-08-11 9:02 ` Janne Blomqvist 0 siblings, 1 reply; 12+ messages in thread From: Janne Blomqvist @ 2016-08-10 11:12 UTC (permalink / raw) To: Jerry DeLisle; +Cc: Fortran List, GCC Patches Hi, thanks for the Ok. However, moments before committing I got cold feet and started digging around; it unfortunately seems that TLS (_Thread_local) is not supported on all targets. So I'll have to copy-paste some configure magic from libgomp/libjava/etc., and provide workarounds for such systems. :( On Tue, Aug 9, 2016 at 8:01 AM, Jerry DeLisle <jvdelisle@charter.net> wrote: > On 08/08/2016 04:01 AM, Janne Blomqvist wrote: >> >> PING**2 >> > > OK, thanks for patch. > > Jerry > >> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist >> <blomqvist.janne@gmail.com> wrote: >>> >>> Hi, >>> >>> the attached patch replaces the current random_number / random_seed >>> implementations with an implementation that better supports threads. >>> It's an improved version of the RFC patch I posted earlier at >>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see >>> that earlier message for a longer-winded explanation of what's wrong >>> with the current implementation and how the patch addresses this. >>> >>> In short, with the patch the random number generator state is now >>> per-thread and stored in a per-thread (TLS) variable, enabling a >>> lockless fast-path. This provides up to 2 orders of magnitude better >>> performance on a synthetic benchmark using 4 threads, and provides a >>> more deterministic result as the order that threads are scheduled does >>> not affect the random number streams for each thread. >>> >>> Compared to the RFC patch, a number of minor and not-so-minor bugs >>> have been fixed, so the patch now passes the testsuite (with a few >>> modifications to the suite, part of the patch). Also, for REAL kinds >>> 4, 8, 10 the generated streams are identical (except precision, of >>> course) (like the current implementation), enabling precision >>> comparisons, as requested by Steve Kargl. However, this does not >>> extend to REAL(16) as that would have necessitated doubling the size >>> of the state, along with potential issues of slower escape from a >>> low-entropy state, for a feature that I believe is not used by >>> particularly many users in the end. So if one wants to do precision >>> comparisons with REAL(16) one must employ a wrapper routine. >>> >>> Regtested on x86_64-pc-linux-gnu, Ok for trunk? >>> >>> frontend ChangeLog: >>> >>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>> >>> * check.c (gfc_check_random_seed): Use new seed size in check. >>> * intrinsic.texi (RANDOM_NUMBER): Updated documentation. >>> (RANDOM_SEED): Likewise. >>> >>> >>> testsuite: >>> >>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>> >>> * gfortran.dg/random_7.f90: Take into account that the last seed >>> value is the special p value. >>> * gfortran.dg/random_seed_1.f90: Seed size is now constant. >>> >>> >>> libgfortran: >>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>> >>> * intrinsics/random.c: Replace KISS with xorshift1024* with >>> per-thread (TLS) state. >>> * runtime/main.c (init): Don't call random_seed_i4. >>> >>> >>> -- >>> Janne Blomqvist >> >> >> >> > -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-10 11:12 ` Janne Blomqvist @ 2016-08-11 9:02 ` Janne Blomqvist 2016-08-11 14:55 ` Rainer Orth 2016-08-12 10:07 ` Thomas Schwinge 0 siblings, 2 replies; 12+ messages in thread From: Janne Blomqvist @ 2016-08-11 9:02 UTC (permalink / raw) To: Jerry DeLisle; +Cc: Fortran List, GCC Patches [-- Attachment #1: Type: text/plain, Size: 3639 bytes --] Hi, committed a slightly modified patch as r239356. Changes from the submitted patch attached. To my surprise, it turned out that my fallback code using __gthread_key_{create,delete} and __ghtread_{get,set}_specific was faster than my TLS code, so I removed the TLS configure magic and the TLS code and just left the __gthread stuff in. On Wed, Aug 10, 2016 at 2:11 PM, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > Hi, > > thanks for the Ok. However, moments before committing I got cold feet > and started digging around; it unfortunately seems that TLS > (_Thread_local) is not supported on all targets. So I'll have to > copy-paste some configure magic from libgomp/libjava/etc., and provide > workarounds for such systems. :( > > On Tue, Aug 9, 2016 at 8:01 AM, Jerry DeLisle <jvdelisle@charter.net> wrote: >> On 08/08/2016 04:01 AM, Janne Blomqvist wrote: >>> >>> PING**2 >>> >> >> OK, thanks for patch. >> >> Jerry >> >>> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist >>> <blomqvist.janne@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> the attached patch replaces the current random_number / random_seed >>>> implementations with an implementation that better supports threads. >>>> It's an improved version of the RFC patch I posted earlier at >>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see >>>> that earlier message for a longer-winded explanation of what's wrong >>>> with the current implementation and how the patch addresses this. >>>> >>>> In short, with the patch the random number generator state is now >>>> per-thread and stored in a per-thread (TLS) variable, enabling a >>>> lockless fast-path. This provides up to 2 orders of magnitude better >>>> performance on a synthetic benchmark using 4 threads, and provides a >>>> more deterministic result as the order that threads are scheduled does >>>> not affect the random number streams for each thread. >>>> >>>> Compared to the RFC patch, a number of minor and not-so-minor bugs >>>> have been fixed, so the patch now passes the testsuite (with a few >>>> modifications to the suite, part of the patch). Also, for REAL kinds >>>> 4, 8, 10 the generated streams are identical (except precision, of >>>> course) (like the current implementation), enabling precision >>>> comparisons, as requested by Steve Kargl. However, this does not >>>> extend to REAL(16) as that would have necessitated doubling the size >>>> of the state, along with potential issues of slower escape from a >>>> low-entropy state, for a feature that I believe is not used by >>>> particularly many users in the end. So if one wants to do precision >>>> comparisons with REAL(16) one must employ a wrapper routine. >>>> >>>> Regtested on x86_64-pc-linux-gnu, Ok for trunk? >>>> >>>> frontend ChangeLog: >>>> >>>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>>> >>>> * check.c (gfc_check_random_seed): Use new seed size in check. >>>> * intrinsic.texi (RANDOM_NUMBER): Updated documentation. >>>> (RANDOM_SEED): Likewise. >>>> >>>> >>>> testsuite: >>>> >>>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>>> >>>> * gfortran.dg/random_7.f90: Take into account that the last seed >>>> value is the special p value. >>>> * gfortran.dg/random_seed_1.f90: Seed size is now constant. >>>> >>>> >>>> libgfortran: >>>> 2016-07-27 Janne Blomqvist <jb@gcc.gnu.org> >>>> >>>> * intrinsics/random.c: Replace KISS with xorshift1024* with >>>> per-thread (TLS) state. >>>> * runtime/main.c (init): Don't call random_seed_i4. >>>> >>>> >>>> -- >>>> Janne Blomqvist >>> >>> >>> >>> >> > > > > -- > Janne Blomqvist -- Janne Blomqvist [-- Attachment #2: random_threads_gthr.diff --] [-- Type: text/plain, Size: 11259 bytes --] diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c index 516b640..6084ebe 100644 --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -205,20 +205,39 @@ static uint64_t master_state[] = { }; -/* The actual random number state, as a TLS variable. */ -static _Thread_local xorshift1024star_state rand_state; +static __gthread_key_t rand_state_key; + +static xorshift1024star_state* +get_rand_state (void) +{ + /* For single threaded apps. */ + static xorshift1024star_state rand_state; + + if (__gthread_active_p ()) + { + void* p = __gthread_getspecific (rand_state_key); + if (!p) + { + p = xcalloc (1, sizeof (xorshift1024star_state)); + __gthread_setspecific (rand_state_key, p); + } + return p; + } + else + return &rand_state; +} static uint64_t -xorshift1024star (void) +xorshift1024star (xorshift1024star_state* rs) { - int p = rand_state.p; - const uint64_t s0 = rand_state.s[p]; - uint64_t s1 = rand_state.s[p = (p + 1) & 15]; + int p = rs->p; + const uint64_t s0 = rs->s[p]; + uint64_t s1 = rs->s[p = (p + 1) & 15]; s1 ^= s1 << 31; - rand_state.s[p] = s1 ^ s0 ^ (s1 >> 11) ^ (s0 >> 30); - rand_state.p = p; - return rand_state.s[p] * UINT64_C(1181783497276652981); + rs->s[p] = s1 ^ s0 ^ (s1 >> 11) ^ (s0 >> 30); + rs->p = p; + return rs->s[p] * UINT64_C(1181783497276652981); } @@ -227,7 +246,7 @@ xorshift1024star (void) non-overlapping subsequences for parallel computations. */ static void -jump (void) +jump (xorshift1024star_state* rs) { static const uint64_t JUMP[] = { 0x84242f96eca9c41dULL, 0xa3c65b8776f96855ULL, 0x5b34a39f070b5837ULL, @@ -244,11 +263,11 @@ jump (void) { if (JUMP[i] & 1ULL << b) for(int j = 0; j < 16; j++) - t[j] ^= rand_state.s[(j + rand_state.p) & 15]; - xorshift1024star (); + t[j] ^= rs->s[(j + rs->p) & 15]; + xorshift1024star (rs); } for(int j = 0; j < 16; j++) - rand_state.s[(j + rand_state.p) & 15] = t[j]; + rs->s[(j + rs->p) & 15] = t[j]; } @@ -256,17 +275,17 @@ jump (void) using the master state and the number of times we must jump. */ static void -init_rand_state (const bool locked) +init_rand_state (xorshift1024star_state* rs, const bool locked) { if (!locked) __gthread_mutex_lock (&random_lock); - memcpy (rand_state.s, master_state, sizeof (master_state)); + memcpy (&rs->s, master_state, sizeof (master_state)); unsigned n = njumps++; if (!locked) __gthread_mutex_unlock (&random_lock); for (unsigned i = 0; i < n; i++) - jump (); - rand_state.init = true; + jump (rs); + rs->init = true; } @@ -345,9 +364,11 @@ getosrandom (void *buf, size_t buflen) void random_r4 (GFC_REAL_4 *x) { - if (unlikely (!rand_state.init)) - init_rand_state (false); - uint64_t r = xorshift1024star (); + xorshift1024star_state* rs = get_rand_state(); + + if (unlikely (!rs->init)) + init_rand_state (rs, false); + uint64_t r = xorshift1024star (rs); /* Take the higher bits, ensuring that a stream of real(4), real(8), and real(10) will be identical (except for precision). */ uint32_t high = (uint32_t) (r >> 32); @@ -362,10 +383,11 @@ void random_r8 (GFC_REAL_8 *x) { GFC_UINTEGER_8 r; + xorshift1024star_state* rs = get_rand_state(); - if (unlikely (!rand_state.init)) - init_rand_state (false); - r = xorshift1024star (); + if (unlikely (!rs->init)) + init_rand_state (rs, false); + r = xorshift1024star (rs); rnumber_8 (x, r); } iexport(random_r8); @@ -379,10 +401,11 @@ void random_r10 (GFC_REAL_10 *x) { GFC_UINTEGER_8 r; + xorshift1024star_state* rs = get_rand_state(); - if (unlikely (!rand_state.init)) - init_rand_state (false); - r = xorshift1024star (); + if (unlikely (!rs->init)) + init_rand_state (rs, false); + r = xorshift1024star (rs); rnumber_10 (x, r); } iexport(random_r10); @@ -398,11 +421,12 @@ void random_r16 (GFC_REAL_16 *x) { GFC_UINTEGER_8 r1, r2; + xorshift1024star_state* rs = get_rand_state(); - if (unlikely (!rand_state.init)) - init_rand_state (false); - r1 = xorshift1024star (); - r2 = xorshift1024star (); + if (unlikely (!rs->init)) + init_rand_state (rs, false); + r1 = xorshift1024star (rs); + r2 = xorshift1024star (rs); rnumber_16 (x, r1, r2); } iexport(random_r16); @@ -422,8 +446,10 @@ arandom_r4 (gfc_array_r4 *x) index_type stride0; index_type dim; GFC_REAL_4 *dest; + xorshift1024star_state* rs = get_rand_state(); int n; + dest = x->base_addr; dim = GFC_DESCRIPTOR_RANK (x); @@ -439,13 +465,13 @@ arandom_r4 (gfc_array_r4 *x) stride0 = stride[0]; - if (unlikely (!rand_state.init)) - init_rand_state (false); + if (unlikely (!rs->init)) + init_rand_state (rs, false); while (dest) { /* random_r4 (dest); */ - uint64_t r = xorshift1024star (); + uint64_t r = xorshift1024star (rs); uint32_t high = (uint32_t) (r >> 32); rnumber_4 (dest, high); @@ -489,6 +515,7 @@ arandom_r8 (gfc_array_r8 *x) index_type stride0; index_type dim; GFC_REAL_8 *dest; + xorshift1024star_state* rs = get_rand_state(); int n; dest = x->base_addr; @@ -506,13 +533,13 @@ arandom_r8 (gfc_array_r8 *x) stride0 = stride[0]; - if (unlikely (!rand_state.init)) - init_rand_state (false); + if (unlikely (!rs->init)) + init_rand_state (rs, false); while (dest) { /* random_r8 (dest); */ - uint64_t r = xorshift1024star (); + uint64_t r = xorshift1024star (rs); rnumber_8 (dest, r); /* Advance to the next element. */ @@ -557,6 +584,7 @@ arandom_r10 (gfc_array_r10 *x) index_type stride0; index_type dim; GFC_REAL_10 *dest; + xorshift1024star_state* rs = get_rand_state(); int n; dest = x->base_addr; @@ -574,13 +602,13 @@ arandom_r10 (gfc_array_r10 *x) stride0 = stride[0]; - if (unlikely (!rand_state.init)) - init_rand_state (false); + if (unlikely (!rs->init)) + init_rand_state (rs, false); while (dest) { /* random_r10 (dest); */ - uint64_t r = xorshift1024star (); + uint64_t r = xorshift1024star (rs); rnumber_10 (dest, r); /* Advance to the next element. */ @@ -627,6 +655,7 @@ arandom_r16 (gfc_array_r16 *x) index_type stride0; index_type dim; GFC_REAL_16 *dest; + xorshift1024star_state* rs = get_rand_state(); int n; dest = x->base_addr; @@ -644,14 +673,14 @@ arandom_r16 (gfc_array_r16 *x) stride0 = stride[0]; - if (unlikely (!rand_state.init)) - init_rand_state (false); + if (unlikely (!rs->init)) + init_rand_state (rs, false); while (dest) { /* random_r16 (dest); */ - uint64_t r1 = xorshift1024star (); - uint64_t r2 = xorshift1024star (); + uint64_t r1 = xorshift1024star (rs); + uint64_t r2 = xorshift1024star (rs); rnumber_16 (dest, r1, r2); /* Advance to the next element. */ @@ -724,6 +753,8 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) if (size != NULL) *size = SZ + 1; + xorshift1024star_state* rs = get_rand_state(); + /* Return the seed to GET data. */ if (get != NULL) { @@ -735,11 +766,11 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) if (GFC_DESCRIPTOR_EXTENT(get,0) < (index_type) SZ + 1) runtime_error ("Array size of GET is too small."); - if (!rand_state.init) - init_rand_state (false); + if (!rs->init) + init_rand_state (rs, false); /* Unscramble the seed. */ - unscramble_seed (seed, (unsigned char *) rand_state.s, sizeof seed); + unscramble_seed (seed, (unsigned char *) rs->s, sizeof seed); /* Then copy it back to the user variable. */ for (size_t i = 0; i < SZ ; i++) @@ -748,7 +779,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) sizeof(GFC_UINTEGER_4)); /* Finally copy the value of p after the seed. */ - get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rand_state.p; + get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rs->p; } else @@ -761,7 +792,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) { getosrandom (master_state, sizeof (master_state)); njumps = 0; - init_rand_state (true); + init_rand_state (rs, true); } if (put != NULL) @@ -784,9 +815,9 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) provide seeds with quality only in the lower or upper part. */ scramble_seed ((unsigned char *) master_state, seed, sizeof seed); njumps = 0; - init_rand_state (true); + init_rand_state (rs, true); - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 16; + rs->p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; } @@ -809,6 +840,8 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get) if (size != NULL) *size = SZ + 1; + xorshift1024star_state* rs = get_rand_state(); + /* Return the seed to GET data. */ if (get != NULL) { @@ -820,15 +853,15 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get) if (GFC_DESCRIPTOR_EXTENT(get,0) < (index_type) SZ + 1) runtime_error ("Array size of GET is too small."); - if (!rand_state.init) - init_rand_state (false); + if (!rs->init) + init_rand_state (rs, false); /* This code now should do correct strides. */ for (size_t i = 0; i < SZ; i++) - memcpy (&(get->base_addr[i * GFC_DESCRIPTOR_STRIDE(get,0)]), &rand_state.s[i], + memcpy (&(get->base_addr[i * GFC_DESCRIPTOR_STRIDE(get,0)]), &rs->s[i], sizeof (GFC_UINTEGER_8)); - get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rand_state.p; + get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rs->p; } else @@ -841,7 +874,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get) { getosrandom (master_state, sizeof (master_state)); njumps = 0; - init_rand_state (true); + init_rand_state (rs, true); } if (put != NULL) @@ -860,22 +893,34 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get) sizeof (GFC_UINTEGER_8)); njumps = 0; - init_rand_state (true); - rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 16; + init_rand_state (rs, true); + rs->p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15; } - __gthread_mutex_unlock (&random_lock); } } iexport(random_seed_i8); -#ifndef __GTHREAD_MUTEX_INIT +#if !defined __GTHREAD_MUTEX_INIT || defined __GTHREADS static void __attribute__((constructor)) -init (void) +constructor_random (void) { +#ifndef __GTHREAD_MUTEX_INIT __GTHREAD_MUTEX_INIT_FUNCTION (&random_lock); +#endif + if (__gthread_active_p ()) + __gthread_key_create (&rand_state_key, &free); +} +#endif + +#ifdef __GTHREADS +static void __attribute__((destructor)) +destructor_random (void) +{ + if (__gthread_active_p ()) + __gthread_key_delete (rand_state_key); } #endif ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-11 9:02 ` Janne Blomqvist @ 2016-08-11 14:55 ` Rainer Orth 2016-08-11 15:39 ` Janne Blomqvist 2016-08-12 10:07 ` Thomas Schwinge 1 sibling, 1 reply; 12+ messages in thread From: Rainer Orth @ 2016-08-11 14:55 UTC (permalink / raw) To: Janne Blomqvist; +Cc: Jerry DeLisle, Fortran List, GCC Patches [-- Attachment #1: Type: text/plain, Size: 1254 bytes --] Hi Janne, > committed a slightly modified patch as r239356. Changes from the > submitted patch attached. To my surprise, it turned out that my > fallback code using __gthread_key_{create,delete} and > __ghtread_{get,set}_specific was faster than my TLS code, so I removed > the TLS configure magic and the TLS code and just left the __gthread > stuff in. this patch broke Solaris bootstrap: /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c: In function 'constructor_random': /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/ random.c:915:45: error: 'free' undeclared (first use in this function); did you mean 'frexp'? __gthread_key_create (&rand_state_key, &free); ^~~~ frexp /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c:915:45: note: each undeclared identifier is reported only once for each function it appears in You need to include <stdlib.h> for the free declaration, as this patch does. Allowed i386-pc-solaris2.12 and sparc-sun-solaris2.12 to continue. I'm going to install this as obvious. Rainer 2016-08-11 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> libgfortran: * intrinsics/random.c: Include <stdlib.h>. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: libgfortran-free.patch --] [-- Type: text/x-patch, Size: 353 bytes --] diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -27,6 +27,7 @@ see the files COPYING3 and COPYING.RUNTI #include "libgfortran.h" #include <gthr.h> #include <string.h> +#include <stdlib.h> /* For getosrandom. */ #ifdef HAVE_SYS_TYPES_H [-- Attachment #3: Type: text/plain, Size: 143 bytes --] -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-11 14:55 ` Rainer Orth @ 2016-08-11 15:39 ` Janne Blomqvist 0 siblings, 0 replies; 12+ messages in thread From: Janne Blomqvist @ 2016-08-11 15:39 UTC (permalink / raw) To: Rainer Orth; +Cc: Jerry DeLisle, Fortran List, GCC Patches On Thu, Aug 11, 2016 at 5:54 PM, Rainer Orth <ro@cebitec.uni-bielefeld.de> wrote: > Hi Janne, > >> committed a slightly modified patch as r239356. Changes from the >> submitted patch attached. To my surprise, it turned out that my >> fallback code using __gthread_key_{create,delete} and >> __ghtread_{get,set}_specific was faster than my TLS code, so I removed >> the TLS configure magic and the TLS code and just left the __gthread >> stuff in. > > this patch broke Solaris bootstrap: > > /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c: In function 'constructor_random': > /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/ > random.c:915:45: error: 'free' undeclared (first use in this function); did you mean 'frexp'? > __gthread_key_create (&rand_state_key, &free); > ^~~~ > frexp > /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c:915:45: note: each undeclared identifier is reported only once for each function it appears in > > You need to include <stdlib.h> for the free declaration, as this patch > does. Allowed i386-pc-solaris2.12 and sparc-sun-solaris2.12 to > continue. I'm going to install this as obvious. Oh, I (incorrectly, obviously!) remembered that stdlib.h would be included via libgfortran.h. Thanks for the quick fix! -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-11 9:02 ` Janne Blomqvist 2016-08-11 14:55 ` Rainer Orth @ 2016-08-12 10:07 ` Thomas Schwinge 2016-08-12 10:37 ` Janne Blomqvist 1 sibling, 1 reply; 12+ messages in thread From: Thomas Schwinge @ 2016-08-12 10:07 UTC (permalink / raw) To: Janne Blomqvist; +Cc: Fortran List, GCC Patches, Jerry DeLisle, cesar [-- Attachment #1: Type: text/plain, Size: 403 bytes --] Hi! On Thu, 11 Aug 2016 12:01:59 +0300, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > committed a slightly modified patch as r239356. [...] This breaks the build for nvptx (but it may actually be a problem specific to the nvptx-newlib); I filed <https://gcc.gnu.org/PR74755> "libgfortran: build breaks if localtime_r prototype is present, but definition is not". Grüße Thomas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch, libfortran] Multi-threaded random_number 2016-08-12 10:07 ` Thomas Schwinge @ 2016-08-12 10:37 ` Janne Blomqvist 0 siblings, 0 replies; 12+ messages in thread From: Janne Blomqvist @ 2016-08-12 10:37 UTC (permalink / raw) To: Thomas Schwinge Cc: Fortran List, GCC Patches, Jerry DeLisle, Cesar Philippidis On Fri, Aug 12, 2016 at 1:07 PM, Thomas Schwinge <thomas@codesourcery.com> wrote: > Hi! > > On Thu, 11 Aug 2016 12:01:59 +0300, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: >> committed a slightly modified patch as r239356. [...] > > This breaks the build for nvptx Sorry about that! > (but it may actually be a problem > specific to the nvptx-newlib); I filed <https://gcc.gnu.org/PR74755> > "libgfortran: build breaks if localtime_r prototype is present, but > definition is not". Hmm, yeah, sounds like something nvptx-newlib should fix? More generally, however, now that you mention it; With the new PRNG we optionally seed it from /dev/urandom, and if that is not found, with gettimeofday() + getpid() (which is why it now includes time_1.h which causes this failure). But if I understand it correctly, none of these approaches work on nvtpx. So is there some sane way to seed the PRNG in an, err, random way? -- Janne Blomqvist ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-08-12 10:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist 2016-08-01 10:42 ` Janne Blomqvist 2016-08-01 10:48 ` Andreas Schwab 2016-08-01 11:00 ` Janne Blomqvist 2016-08-08 11:01 ` Janne Blomqvist 2016-08-09 5:01 ` Jerry DeLisle 2016-08-10 11:12 ` Janne Blomqvist 2016-08-11 9:02 ` Janne Blomqvist 2016-08-11 14:55 ` Rainer Orth 2016-08-11 15:39 ` Janne Blomqvist 2016-08-12 10:07 ` Thomas Schwinge 2016-08-12 10:37 ` Janne Blomqvist
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).