public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch, libfortran] Multi-threaded random_number
@ 2016-07-24 13:45 Janne Blomqvist
  2016-08-01 10:42 ` Janne Blomqvist
  2016-08-08 11:01 ` Janne Blomqvist
  0 siblings, 2 replies; 12+ messages in thread
From: Janne Blomqvist @ 2016-07-24 13:45 UTC (permalink / raw)
  To: Fortran List, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2264 bytes --]

Hi,

the attached patch replaces the current random_number / random_seed
implementations with an implementation that better supports threads.
It's an improved version of the RFC patch I posted earlier at
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
that earlier message for a longer-winded explanation of what's wrong
with the current implementation and how the patch addresses this.

In short, with the patch the random number generator state is now
per-thread and stored in a per-thread (TLS) variable, enabling a
lockless fast-path. This provides up to 2 orders of magnitude better
performance on a synthetic benchmark using 4 threads, and provides a
more deterministic result as the order that threads are scheduled does
not affect the random number streams for each thread.

Compared to the RFC patch, a number of minor and not-so-minor bugs
have been fixed, so the patch now passes the testsuite (with a few
modifications to the suite, part of the patch). Also, for REAL kinds
4, 8, 10 the generated streams are identical (except precision, of
course) (like the current implementation), enabling precision
comparisons, as requested by Steve Kargl. However, this does not
extend to REAL(16) as that would have necessitated doubling the size
of the state, along with potential issues of slower escape from a
low-entropy state, for a feature that I believe is not used by
particularly many users in the end. So if one wants to do precision
comparisons with REAL(16) one must employ a wrapper routine.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

frontend ChangeLog:

2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>

    * check.c (gfc_check_random_seed): Use new seed size in check.
    * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
    (RANDOM_SEED): Likewise.


testsuite:

2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>

    * gfortran.dg/random_7.f90: Take into account that the last seed
    value is the special p value.
    * gfortran.dg/random_seed_1.f90: Seed size is now constant.


libgfortran:
2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>

    * intrinsics/random.c: Replace KISS with xorshift1024* with
    per-thread (TLS) state.
    * runtime/main.c (init): Don't call random_seed_i4.


-- 
Janne Blomqvist

[-- Attachment #2: random_threads.diff.gz --]
[-- Type: application/x-gzip, Size: 10253 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist
@ 2016-08-01 10:42 ` Janne Blomqvist
  2016-08-01 10:48   ` Andreas Schwab
  2016-08-08 11:01 ` Janne Blomqvist
  1 sibling, 1 reply; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-01 10:42 UTC (permalink / raw)
  To: Fortran List, GCC Patches

PING?

Also, a minor bugfix on top of the previous patch:

diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
index 9b54a02..21395ea 100644
--- a/libgfortran/intrinsics/random.c
+++ b/libgfortran/intrinsics/random.c
@@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4
*put, gfc_array_i4 *get)
       init_rand_state (true);

       /* Copy p & 15 */
-      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
+      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15;
     }


@@ -862,7 +862,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8
*put, gfc_array_i8 *get)

       njumps = 0;
       init_rand_state (true);
-      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
+      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15;
      }


On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> Hi,
>
> the attached patch replaces the current random_number / random_seed
> implementations with an implementation that better supports threads.
> It's an improved version of the RFC patch I posted earlier at
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
> that earlier message for a longer-winded explanation of what's wrong
> with the current implementation and how the patch addresses this.
>
> In short, with the patch the random number generator state is now
> per-thread and stored in a per-thread (TLS) variable, enabling a
> lockless fast-path. This provides up to 2 orders of magnitude better
> performance on a synthetic benchmark using 4 threads, and provides a
> more deterministic result as the order that threads are scheduled does
> not affect the random number streams for each thread.
>
> Compared to the RFC patch, a number of minor and not-so-minor bugs
> have been fixed, so the patch now passes the testsuite (with a few
> modifications to the suite, part of the patch). Also, for REAL kinds
> 4, 8, 10 the generated streams are identical (except precision, of
> course) (like the current implementation), enabling precision
> comparisons, as requested by Steve Kargl. However, this does not
> extend to REAL(16) as that would have necessitated doubling the size
> of the state, along with potential issues of slower escape from a
> low-entropy state, for a feature that I believe is not used by
> particularly many users in the end. So if one wants to do precision
> comparisons with REAL(16) one must employ a wrapper routine.
>
> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>
> frontend ChangeLog:
>
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * check.c (gfc_check_random_seed): Use new seed size in check.
>     * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>     (RANDOM_SEED): Likewise.
>
>
> testsuite:
>
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * gfortran.dg/random_7.f90: Take into account that the last seed
>     value is the special p value.
>     * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>
>
> libgfortran:
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * intrinsics/random.c: Replace KISS with xorshift1024* with
>     per-thread (TLS) state.
>     * runtime/main.c (init): Don't call random_seed_i4.
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-01 10:42 ` Janne Blomqvist
@ 2016-08-01 10:48   ` Andreas Schwab
  2016-08-01 11:00     ` Janne Blomqvist
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Schwab @ 2016-08-01 10:48 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: Fortran List, GCC Patches

On Mo, Aug 01 2016, Janne Blomqvist <blomqvist.janne@gmail.com> wrote:

> PING?
>
> Also, a minor bugfix on top of the previous patch:
>
> diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
> index 9b54a02..21395ea 100644
> --- a/libgfortran/intrinsics/random.c
> +++ b/libgfortran/intrinsics/random.c
> @@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4
> *put, gfc_array_i4 *get)
>        init_rand_state (true);
>
>        /* Copy p & 15 */
> -      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
> +      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15;

The comment no longer fits.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-01 10:48   ` Andreas Schwab
@ 2016-08-01 11:00     ` Janne Blomqvist
  0 siblings, 0 replies; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-01 11:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Fortran List, GCC Patches

On Mon, Aug 1, 2016 at 1:48 PM, Andreas Schwab <schwab@suse.de> wrote:
> On Mo, Aug 01 2016, Janne Blomqvist <blomqvist.janne@gmail.com> wrote:
>
>> PING?
>>
>> Also, a minor bugfix on top of the previous patch:
>>
>> diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
>> index 9b54a02..21395ea 100644
>> --- a/libgfortran/intrinsics/random.c
>> +++ b/libgfortran/intrinsics/random.c
>> @@ -787,7 +787,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4
>> *put, gfc_array_i4 *get)
>>        init_rand_state (true);
>>
>>        /* Copy p & 15 */
>> -      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
>> +      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 15;
>
> The comment no longer fits.

Indeed, thanks. I've fixed it in my own tree.

-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist
  2016-08-01 10:42 ` Janne Blomqvist
@ 2016-08-08 11:01 ` Janne Blomqvist
  2016-08-09  5:01   ` Jerry DeLisle
  1 sibling, 1 reply; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-08 11:01 UTC (permalink / raw)
  To: Fortran List, GCC Patches

PING**2

On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> Hi,
>
> the attached patch replaces the current random_number / random_seed
> implementations with an implementation that better supports threads.
> It's an improved version of the RFC patch I posted earlier at
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
> that earlier message for a longer-winded explanation of what's wrong
> with the current implementation and how the patch addresses this.
>
> In short, with the patch the random number generator state is now
> per-thread and stored in a per-thread (TLS) variable, enabling a
> lockless fast-path. This provides up to 2 orders of magnitude better
> performance on a synthetic benchmark using 4 threads, and provides a
> more deterministic result as the order that threads are scheduled does
> not affect the random number streams for each thread.
>
> Compared to the RFC patch, a number of minor and not-so-minor bugs
> have been fixed, so the patch now passes the testsuite (with a few
> modifications to the suite, part of the patch). Also, for REAL kinds
> 4, 8, 10 the generated streams are identical (except precision, of
> course) (like the current implementation), enabling precision
> comparisons, as requested by Steve Kargl. However, this does not
> extend to REAL(16) as that would have necessitated doubling the size
> of the state, along with potential issues of slower escape from a
> low-entropy state, for a feature that I believe is not used by
> particularly many users in the end. So if one wants to do precision
> comparisons with REAL(16) one must employ a wrapper routine.
>
> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>
> frontend ChangeLog:
>
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * check.c (gfc_check_random_seed): Use new seed size in check.
>     * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>     (RANDOM_SEED): Likewise.
>
>
> testsuite:
>
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * gfortran.dg/random_7.f90: Take into account that the last seed
>     value is the special p value.
>     * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>
>
> libgfortran:
> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>
>     * intrinsics/random.c: Replace KISS with xorshift1024* with
>     per-thread (TLS) state.
>     * runtime/main.c (init): Don't call random_seed_i4.
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-08 11:01 ` Janne Blomqvist
@ 2016-08-09  5:01   ` Jerry DeLisle
  2016-08-10 11:12     ` Janne Blomqvist
  0 siblings, 1 reply; 12+ messages in thread
From: Jerry DeLisle @ 2016-08-09  5:01 UTC (permalink / raw)
  To: Janne Blomqvist, Fortran List, GCC Patches

On 08/08/2016 04:01 AM, Janne Blomqvist wrote:
> PING**2
>

OK, thanks for patch.

Jerry
> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
>> Hi,
>>
>> the attached patch replaces the current random_number / random_seed
>> implementations with an implementation that better supports threads.
>> It's an improved version of the RFC patch I posted earlier at
>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
>> that earlier message for a longer-winded explanation of what's wrong
>> with the current implementation and how the patch addresses this.
>>
>> In short, with the patch the random number generator state is now
>> per-thread and stored in a per-thread (TLS) variable, enabling a
>> lockless fast-path. This provides up to 2 orders of magnitude better
>> performance on a synthetic benchmark using 4 threads, and provides a
>> more deterministic result as the order that threads are scheduled does
>> not affect the random number streams for each thread.
>>
>> Compared to the RFC patch, a number of minor and not-so-minor bugs
>> have been fixed, so the patch now passes the testsuite (with a few
>> modifications to the suite, part of the patch). Also, for REAL kinds
>> 4, 8, 10 the generated streams are identical (except precision, of
>> course) (like the current implementation), enabling precision
>> comparisons, as requested by Steve Kargl. However, this does not
>> extend to REAL(16) as that would have necessitated doubling the size
>> of the state, along with potential issues of slower escape from a
>> low-entropy state, for a feature that I believe is not used by
>> particularly many users in the end. So if one wants to do precision
>> comparisons with REAL(16) one must employ a wrapper routine.
>>
>> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>>
>> frontend ChangeLog:
>>
>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>
>>     * check.c (gfc_check_random_seed): Use new seed size in check.
>>     * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>>     (RANDOM_SEED): Likewise.
>>
>>
>> testsuite:
>>
>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>
>>     * gfortran.dg/random_7.f90: Take into account that the last seed
>>     value is the special p value.
>>     * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>>
>>
>> libgfortran:
>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>
>>     * intrinsics/random.c: Replace KISS with xorshift1024* with
>>     per-thread (TLS) state.
>>     * runtime/main.c (init): Don't call random_seed_i4.
>>
>>
>> --
>> Janne Blomqvist
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-09  5:01   ` Jerry DeLisle
@ 2016-08-10 11:12     ` Janne Blomqvist
  2016-08-11  9:02       ` Janne Blomqvist
  0 siblings, 1 reply; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-10 11:12 UTC (permalink / raw)
  To: Jerry DeLisle; +Cc: Fortran List, GCC Patches

Hi,

thanks for the Ok. However, moments before committing I got cold feet
and started digging around; it unfortunately seems that TLS
(_Thread_local) is not supported on all targets. So I'll have to
copy-paste some configure magic from libgomp/libjava/etc., and provide
workarounds for such systems. :(

On Tue, Aug 9, 2016 at 8:01 AM, Jerry DeLisle <jvdelisle@charter.net> wrote:
> On 08/08/2016 04:01 AM, Janne Blomqvist wrote:
>>
>> PING**2
>>
>
> OK, thanks for patch.
>
> Jerry
>
>> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
>> <blomqvist.janne@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> the attached patch replaces the current random_number / random_seed
>>> implementations with an implementation that better supports threads.
>>> It's an improved version of the RFC patch I posted earlier at
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
>>> that earlier message for a longer-winded explanation of what's wrong
>>> with the current implementation and how the patch addresses this.
>>>
>>> In short, with the patch the random number generator state is now
>>> per-thread and stored in a per-thread (TLS) variable, enabling a
>>> lockless fast-path. This provides up to 2 orders of magnitude better
>>> performance on a synthetic benchmark using 4 threads, and provides a
>>> more deterministic result as the order that threads are scheduled does
>>> not affect the random number streams for each thread.
>>>
>>> Compared to the RFC patch, a number of minor and not-so-minor bugs
>>> have been fixed, so the patch now passes the testsuite (with a few
>>> modifications to the suite, part of the patch). Also, for REAL kinds
>>> 4, 8, 10 the generated streams are identical (except precision, of
>>> course) (like the current implementation), enabling precision
>>> comparisons, as requested by Steve Kargl. However, this does not
>>> extend to REAL(16) as that would have necessitated doubling the size
>>> of the state, along with potential issues of slower escape from a
>>> low-entropy state, for a feature that I believe is not used by
>>> particularly many users in the end. So if one wants to do precision
>>> comparisons with REAL(16) one must employ a wrapper routine.
>>>
>>> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>>>
>>> frontend ChangeLog:
>>>
>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>
>>>     * check.c (gfc_check_random_seed): Use new seed size in check.
>>>     * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>>>     (RANDOM_SEED): Likewise.
>>>
>>>
>>> testsuite:
>>>
>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>
>>>     * gfortran.dg/random_7.f90: Take into account that the last seed
>>>     value is the special p value.
>>>     * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>>>
>>>
>>> libgfortran:
>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>
>>>     * intrinsics/random.c: Replace KISS with xorshift1024* with
>>>     per-thread (TLS) state.
>>>     * runtime/main.c (init): Don't call random_seed_i4.
>>>
>>>
>>> --
>>> Janne Blomqvist
>>
>>
>>
>>
>



-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-10 11:12     ` Janne Blomqvist
@ 2016-08-11  9:02       ` Janne Blomqvist
  2016-08-11 14:55         ` Rainer Orth
  2016-08-12 10:07         ` Thomas Schwinge
  0 siblings, 2 replies; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-11  9:02 UTC (permalink / raw)
  To: Jerry DeLisle; +Cc: Fortran List, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]

Hi,

committed a slightly modified patch as r239356. Changes from the
submitted patch attached. To my surprise, it turned out that my
fallback code using __gthread_key_{create,delete} and
__ghtread_{get,set}_specific was faster than my TLS code, so I removed
the TLS configure magic and the TLS code and just left the __gthread
stuff in.

On Wed, Aug 10, 2016 at 2:11 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> Hi,
>
> thanks for the Ok. However, moments before committing I got cold feet
> and started digging around; it unfortunately seems that TLS
> (_Thread_local) is not supported on all targets. So I'll have to
> copy-paste some configure magic from libgomp/libjava/etc., and provide
> workarounds for such systems. :(
>
> On Tue, Aug 9, 2016 at 8:01 AM, Jerry DeLisle <jvdelisle@charter.net> wrote:
>> On 08/08/2016 04:01 AM, Janne Blomqvist wrote:
>>>
>>> PING**2
>>>
>>
>> OK, thanks for patch.
>>
>> Jerry
>>
>>> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
>>> <blomqvist.janne@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> the attached patch replaces the current random_number / random_seed
>>>> implementations with an implementation that better supports threads.
>>>> It's an improved version of the RFC patch I posted earlier at
>>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
>>>> that earlier message for a longer-winded explanation of what's wrong
>>>> with the current implementation and how the patch addresses this.
>>>>
>>>> In short, with the patch the random number generator state is now
>>>> per-thread and stored in a per-thread (TLS) variable, enabling a
>>>> lockless fast-path. This provides up to 2 orders of magnitude better
>>>> performance on a synthetic benchmark using 4 threads, and provides a
>>>> more deterministic result as the order that threads are scheduled does
>>>> not affect the random number streams for each thread.
>>>>
>>>> Compared to the RFC patch, a number of minor and not-so-minor bugs
>>>> have been fixed, so the patch now passes the testsuite (with a few
>>>> modifications to the suite, part of the patch). Also, for REAL kinds
>>>> 4, 8, 10 the generated streams are identical (except precision, of
>>>> course) (like the current implementation), enabling precision
>>>> comparisons, as requested by Steve Kargl. However, this does not
>>>> extend to REAL(16) as that would have necessitated doubling the size
>>>> of the state, along with potential issues of slower escape from a
>>>> low-entropy state, for a feature that I believe is not used by
>>>> particularly many users in the end. So if one wants to do precision
>>>> comparisons with REAL(16) one must employ a wrapper routine.
>>>>
>>>> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>>>>
>>>> frontend ChangeLog:
>>>>
>>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>>
>>>>     * check.c (gfc_check_random_seed): Use new seed size in check.
>>>>     * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>>>>     (RANDOM_SEED): Likewise.
>>>>
>>>>
>>>> testsuite:
>>>>
>>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>>
>>>>     * gfortran.dg/random_7.f90: Take into account that the last seed
>>>>     value is the special p value.
>>>>     * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>>>>
>>>>
>>>> libgfortran:
>>>> 2016-07-27  Janne Blomqvist  <jb@gcc.gnu.org>
>>>>
>>>>     * intrinsics/random.c: Replace KISS with xorshift1024* with
>>>>     per-thread (TLS) state.
>>>>     * runtime/main.c (init): Don't call random_seed_i4.
>>>>
>>>>
>>>> --
>>>> Janne Blomqvist
>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

[-- Attachment #2: random_threads_gthr.diff --]
[-- Type: text/plain, Size: 11259 bytes --]

diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
index 516b640..6084ebe 100644
--- a/libgfortran/intrinsics/random.c
+++ b/libgfortran/intrinsics/random.c
@@ -205,20 +205,39 @@ static uint64_t master_state[] = {
 };
 
 
-/* The actual random number state, as a TLS variable.  */
-static _Thread_local xorshift1024star_state rand_state;
+static __gthread_key_t rand_state_key;
+
+static xorshift1024star_state*
+get_rand_state (void)
+{
+  /* For single threaded apps.  */
+  static xorshift1024star_state rand_state;
+
+  if (__gthread_active_p ())
+    {
+      void* p = __gthread_getspecific (rand_state_key);
+      if (!p)
+	{
+	  p = xcalloc (1, sizeof (xorshift1024star_state));
+	  __gthread_setspecific (rand_state_key, p);
+	}
+      return p;
+    }
+  else
+    return &rand_state;
+}
 
 
 static uint64_t
-xorshift1024star (void)
+xorshift1024star (xorshift1024star_state* rs)
 {
-  int p = rand_state.p;
-  const uint64_t s0 = rand_state.s[p];
-  uint64_t s1 = rand_state.s[p = (p + 1) & 15];
+  int p = rs->p;
+  const uint64_t s0 = rs->s[p];
+  uint64_t s1 = rs->s[p = (p + 1) & 15];
   s1 ^= s1 << 31;
-  rand_state.s[p] = s1 ^ s0 ^ (s1 >> 11) ^ (s0 >> 30);
-  rand_state.p = p;
-  return rand_state.s[p] * UINT64_C(1181783497276652981);
+  rs->s[p] = s1 ^ s0 ^ (s1 >> 11) ^ (s0 >> 30);
+  rs->p = p;
+  return rs->s[p] * UINT64_C(1181783497276652981);
 }
 
 
@@ -227,7 +246,7 @@ xorshift1024star (void)
    non-overlapping subsequences for parallel computations. */
 
 static void
-jump (void)
+jump (xorshift1024star_state* rs)
 {
   static const uint64_t JUMP[] = {
     0x84242f96eca9c41dULL, 0xa3c65b8776f96855ULL, 0x5b34a39f070b5837ULL,
@@ -244,11 +263,11 @@ jump (void)
       {
 	if (JUMP[i] & 1ULL << b)
 	  for(int j = 0; j < 16; j++)
-	    t[j] ^= rand_state.s[(j + rand_state.p) & 15];
-	xorshift1024star ();
+	    t[j] ^= rs->s[(j + rs->p) & 15];
+	xorshift1024star (rs);
       }
   for(int j = 0; j < 16; j++)
-    rand_state.s[(j + rand_state.p) & 15] = t[j];
+    rs->s[(j + rs->p) & 15] = t[j];
 }
 
 
@@ -256,17 +275,17 @@ jump (void)
    using the master state and the number of times we must jump.  */
 
 static void
-init_rand_state (const bool locked)
+init_rand_state (xorshift1024star_state* rs, const bool locked)
 {
   if (!locked)
     __gthread_mutex_lock (&random_lock);
-  memcpy (rand_state.s, master_state, sizeof (master_state));
+  memcpy (&rs->s, master_state, sizeof (master_state));
   unsigned n = njumps++;
   if (!locked)
     __gthread_mutex_unlock (&random_lock);
   for (unsigned i = 0; i < n; i++)
-    jump ();
-  rand_state.init = true;
+    jump (rs);
+  rs->init = true;
 }
 
 
@@ -345,9 +364,11 @@ getosrandom (void *buf, size_t buflen)
 void
 random_r4 (GFC_REAL_4 *x)
 {
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
-  uint64_t r = xorshift1024star ();
+  xorshift1024star_state* rs = get_rand_state();
+
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
+  uint64_t r = xorshift1024star (rs);
   /* Take the higher bits, ensuring that a stream of real(4), real(8),
      and real(10) will be identical (except for precision).  */
   uint32_t high = (uint32_t) (r >> 32);
@@ -362,10 +383,11 @@ void
 random_r8 (GFC_REAL_8 *x)
 {
   GFC_UINTEGER_8 r;
+  xorshift1024star_state* rs = get_rand_state();
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
-  r = xorshift1024star ();
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
+  r = xorshift1024star (rs);
   rnumber_8 (x, r);
 }
 iexport(random_r8);
@@ -379,10 +401,11 @@ void
 random_r10 (GFC_REAL_10 *x)
 {
   GFC_UINTEGER_8 r;
+  xorshift1024star_state* rs = get_rand_state();
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
-  r = xorshift1024star ();
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
+  r = xorshift1024star (rs);
   rnumber_10 (x, r);
 }
 iexport(random_r10);
@@ -398,11 +421,12 @@ void
 random_r16 (GFC_REAL_16 *x)
 {
   GFC_UINTEGER_8 r1, r2;
+  xorshift1024star_state* rs = get_rand_state();
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
-  r1 = xorshift1024star ();
-  r2 = xorshift1024star ();
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
+  r1 = xorshift1024star (rs);
+  r2 = xorshift1024star (rs);
   rnumber_16 (x, r1, r2);
 }
 iexport(random_r16);
@@ -422,8 +446,10 @@ arandom_r4 (gfc_array_r4 *x)
   index_type stride0;
   index_type dim;
   GFC_REAL_4 *dest;
+  xorshift1024star_state* rs = get_rand_state();
   int n;
 
+
   dest = x->base_addr;
 
   dim = GFC_DESCRIPTOR_RANK (x);
@@ -439,13 +465,13 @@ arandom_r4 (gfc_array_r4 *x)
 
   stride0 = stride[0];
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
 
   while (dest)
     {
       /* random_r4 (dest);  */
-      uint64_t r = xorshift1024star ();
+      uint64_t r = xorshift1024star (rs);
       uint32_t high = (uint32_t) (r >> 32);
       rnumber_4 (dest, high);
 
@@ -489,6 +515,7 @@ arandom_r8 (gfc_array_r8 *x)
   index_type stride0;
   index_type dim;
   GFC_REAL_8 *dest;
+  xorshift1024star_state* rs = get_rand_state();
   int n;
 
   dest = x->base_addr;
@@ -506,13 +533,13 @@ arandom_r8 (gfc_array_r8 *x)
 
   stride0 = stride[0];
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
 
   while (dest)
     {
       /* random_r8 (dest);  */
-      uint64_t r = xorshift1024star ();
+      uint64_t r = xorshift1024star (rs);
       rnumber_8 (dest, r);
 
       /* Advance to the next element.  */
@@ -557,6 +584,7 @@ arandom_r10 (gfc_array_r10 *x)
   index_type stride0;
   index_type dim;
   GFC_REAL_10 *dest;
+  xorshift1024star_state* rs = get_rand_state();
   int n;
 
   dest = x->base_addr;
@@ -574,13 +602,13 @@ arandom_r10 (gfc_array_r10 *x)
 
   stride0 = stride[0];
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
 
   while (dest)
     {
       /* random_r10 (dest);  */
-      uint64_t r = xorshift1024star ();
+      uint64_t r = xorshift1024star (rs);
       rnumber_10 (dest, r);
 
       /* Advance to the next element.  */
@@ -627,6 +655,7 @@ arandom_r16 (gfc_array_r16 *x)
   index_type stride0;
   index_type dim;
   GFC_REAL_16 *dest;
+  xorshift1024star_state* rs = get_rand_state();
   int n;
 
   dest = x->base_addr;
@@ -644,14 +673,14 @@ arandom_r16 (gfc_array_r16 *x)
 
   stride0 = stride[0];
 
-  if (unlikely (!rand_state.init))
-    init_rand_state (false);
+  if (unlikely (!rs->init))
+    init_rand_state (rs, false);
 
   while (dest)
     {
       /* random_r16 (dest);  */
-      uint64_t r1 = xorshift1024star ();
-      uint64_t r2 = xorshift1024star ();
+      uint64_t r1 = xorshift1024star (rs);
+      uint64_t r2 = xorshift1024star (rs);
       rnumber_16 (dest, r1, r2);
 
       /* Advance to the next element.  */
@@ -724,6 +753,8 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
   if (size != NULL)
     *size = SZ + 1;
 
+  xorshift1024star_state* rs = get_rand_state();
+
   /* Return the seed to GET data.  */
   if (get != NULL)
     {
@@ -735,11 +766,11 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
       if (GFC_DESCRIPTOR_EXTENT(get,0) < (index_type) SZ + 1)
 	runtime_error ("Array size of GET is too small.");
 
-      if (!rand_state.init)
-	init_rand_state (false);
+      if (!rs->init)
+	init_rand_state (rs, false);
 
       /* Unscramble the seed.  */
-      unscramble_seed (seed, (unsigned char *) rand_state.s, sizeof seed);
+      unscramble_seed (seed, (unsigned char *) rs->s, sizeof seed);
 
       /*  Then copy it back to the user variable.  */
       for (size_t i = 0; i < SZ ; i++)
@@ -748,7 +779,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
                sizeof(GFC_UINTEGER_4));
 
       /* Finally copy the value of p after the seed.  */
-      get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rand_state.p;
+      get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rs->p;
     }
 
   else
@@ -761,7 +792,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
     {
       getosrandom (master_state, sizeof (master_state));
       njumps = 0;
-      init_rand_state (true);
+      init_rand_state (rs, true);
     }
 
   if (put != NULL)
@@ -784,9 +815,9 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
 	 provide seeds with quality only in the lower or upper part.  */
       scramble_seed ((unsigned char *) master_state, seed, sizeof seed);
       njumps = 0;
-      init_rand_state (true);
+      init_rand_state (rs, true);
 
-      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 16;
+      rs->p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
     }
 
 
@@ -809,6 +840,8 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get)
   if (size != NULL)
     *size = SZ + 1;
 
+  xorshift1024star_state* rs = get_rand_state();
+
   /* Return the seed to GET data.  */
   if (get != NULL)
     {
@@ -820,15 +853,15 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get)
       if (GFC_DESCRIPTOR_EXTENT(get,0) < (index_type) SZ + 1)
 	runtime_error ("Array size of GET is too small.");
 
-      if (!rand_state.init)
-	init_rand_state (false);
+      if (!rs->init)
+	init_rand_state (rs, false);
 
       /*  This code now should do correct strides.  */
       for (size_t i = 0; i < SZ; i++)
-	memcpy (&(get->base_addr[i * GFC_DESCRIPTOR_STRIDE(get,0)]), &rand_state.s[i],
+	memcpy (&(get->base_addr[i * GFC_DESCRIPTOR_STRIDE(get,0)]), &rs->s[i],
 		sizeof (GFC_UINTEGER_8));
 
-      get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rand_state.p;
+      get->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(get, 0)] = rs->p;
     }
 
   else
@@ -841,7 +874,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get)
     {
       getosrandom (master_state, sizeof (master_state));
       njumps = 0;
-      init_rand_state (true);
+      init_rand_state (rs, true);
     }
 
   if (put != NULL)
@@ -860,22 +893,34 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, gfc_array_i8 *get)
 		sizeof (GFC_UINTEGER_8));
 
       njumps = 0;
-      init_rand_state (true);
-      rand_state.p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] % 16;
+      init_rand_state (rs, true);
+      rs->p = put->base_addr[SZ * GFC_DESCRIPTOR_STRIDE(put, 0)] & 15;
      }
 
 
-
   __gthread_mutex_unlock (&random_lock);
     }
 }
 iexport(random_seed_i8);
 
 
-#ifndef __GTHREAD_MUTEX_INIT
+#if !defined __GTHREAD_MUTEX_INIT || defined __GTHREADS
 static void __attribute__((constructor))
-init (void)
+constructor_random (void)
 {
+#ifndef __GTHREAD_MUTEX_INIT
   __GTHREAD_MUTEX_INIT_FUNCTION (&random_lock);
+#endif
+  if (__gthread_active_p ())
+    __gthread_key_create (&rand_state_key, &free);
+}
+#endif
+
+#ifdef __GTHREADS
+static void __attribute__((destructor))
+destructor_random (void)
+{
+  if (__gthread_active_p ())
+    __gthread_key_delete (rand_state_key);
 }
 #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-11  9:02       ` Janne Blomqvist
@ 2016-08-11 14:55         ` Rainer Orth
  2016-08-11 15:39           ` Janne Blomqvist
  2016-08-12 10:07         ` Thomas Schwinge
  1 sibling, 1 reply; 12+ messages in thread
From: Rainer Orth @ 2016-08-11 14:55 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: Jerry DeLisle, Fortran List, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

Hi Janne,

> committed a slightly modified patch as r239356. Changes from the
> submitted patch attached. To my surprise, it turned out that my
> fallback code using __gthread_key_{create,delete} and
> __ghtread_{get,set}_specific was faster than my TLS code, so I removed
> the TLS configure magic and the TLS code and just left the __gthread
> stuff in.

this patch broke Solaris bootstrap:

/vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c: In function 'constructor_random':
/vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/
random.c:915:45: error: 'free' undeclared (first use in this function); did you mean 'frexp'?
     __gthread_key_create (&rand_state_key, &free);
                                             ^~~~
                                             frexp
/vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c:915:45: note: each undeclared identifier is reported only once for each function it appears in

You need to include <stdlib.h> for the free declaration, as this patch
does.  Allowed i386-pc-solaris2.12 and sparc-sun-solaris2.12 to
continue.  I'm going to install this as obvious.

	Rainer


2016-08-11  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	libgfortran:
	* intrinsics/random.c: Include <stdlib.h>.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libgfortran-free.patch --]
[-- Type: text/x-patch, Size: 353 bytes --]

diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
--- a/libgfortran/intrinsics/random.c
+++ b/libgfortran/intrinsics/random.c
@@ -27,6 +27,7 @@ see the files COPYING3 and COPYING.RUNTI
 #include "libgfortran.h"
 #include <gthr.h>
 #include <string.h>
+#include <stdlib.h>
 
 /* For getosrandom.  */
 #ifdef HAVE_SYS_TYPES_H

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]


-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-11 14:55         ` Rainer Orth
@ 2016-08-11 15:39           ` Janne Blomqvist
  0 siblings, 0 replies; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-11 15:39 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Jerry DeLisle, Fortran List, GCC Patches

On Thu, Aug 11, 2016 at 5:54 PM, Rainer Orth
<ro@cebitec.uni-bielefeld.de> wrote:
> Hi Janne,
>
>> committed a slightly modified patch as r239356. Changes from the
>> submitted patch attached. To my surprise, it turned out that my
>> fallback code using __gthread_key_{create,delete} and
>> __ghtread_{get,set}_specific was faster than my TLS code, so I removed
>> the TLS configure magic and the TLS code and just left the __gthread
>> stuff in.
>
> this patch broke Solaris bootstrap:
>
> /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c: In function 'constructor_random':
> /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/
> random.c:915:45: error: 'free' undeclared (first use in this function); did you mean 'frexp'?
>      __gthread_key_create (&rand_state_key, &free);
>                                              ^~~~
>                                              frexp
> /vol/gcc/src/hg/trunk/local/libgfortran/intrinsics/random.c:915:45: note: each undeclared identifier is reported only once for each function it appears in
>
> You need to include <stdlib.h> for the free declaration, as this patch
> does.  Allowed i386-pc-solaris2.12 and sparc-sun-solaris2.12 to
> continue.  I'm going to install this as obvious.

Oh, I (incorrectly, obviously!) remembered that stdlib.h would be
included via libgfortran.h. Thanks for the quick fix!


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-11  9:02       ` Janne Blomqvist
  2016-08-11 14:55         ` Rainer Orth
@ 2016-08-12 10:07         ` Thomas Schwinge
  2016-08-12 10:37           ` Janne Blomqvist
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2016-08-12 10:07 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: Fortran List, GCC Patches, Jerry DeLisle, cesar

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

Hi!

On Thu, 11 Aug 2016 12:01:59 +0300, Janne Blomqvist <blomqvist.janne@gmail.com> wrote:
> committed a slightly modified patch as r239356. [...]

This breaks the build for nvptx (but it may actually be a problem
specific to the nvptx-newlib); I filed <https://gcc.gnu.org/PR74755>
"libgfortran: build breaks if localtime_r prototype is present, but
definition is not".


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch, libfortran] Multi-threaded random_number
  2016-08-12 10:07         ` Thomas Schwinge
@ 2016-08-12 10:37           ` Janne Blomqvist
  0 siblings, 0 replies; 12+ messages in thread
From: Janne Blomqvist @ 2016-08-12 10:37 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Fortran List, GCC Patches, Jerry DeLisle, Cesar Philippidis

On Fri, Aug 12, 2016 at 1:07 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Thu, 11 Aug 2016 12:01:59 +0300, Janne Blomqvist <blomqvist.janne@gmail.com> wrote:
>> committed a slightly modified patch as r239356. [...]
>
> This breaks the build for nvptx

Sorry about that!

> (but it may actually be a problem
> specific to the nvptx-newlib); I filed <https://gcc.gnu.org/PR74755>
> "libgfortran: build breaks if localtime_r prototype is present, but
> definition is not".

Hmm, yeah, sounds like something nvptx-newlib should fix?

More generally, however, now that you mention it; With the new PRNG we
optionally seed it from /dev/urandom, and if that is not found, with
gettimeofday() + getpid() (which is why it now includes time_1.h which
causes this failure). But if I understand it correctly, none of these
approaches work on nvtpx. So is there some sane way to seed the PRNG
in an, err, random way?

-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-12 10:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-24 13:45 [Patch, libfortran] Multi-threaded random_number Janne Blomqvist
2016-08-01 10:42 ` Janne Blomqvist
2016-08-01 10:48   ` Andreas Schwab
2016-08-01 11:00     ` Janne Blomqvist
2016-08-08 11:01 ` Janne Blomqvist
2016-08-09  5:01   ` Jerry DeLisle
2016-08-10 11:12     ` Janne Blomqvist
2016-08-11  9:02       ` Janne Blomqvist
2016-08-11 14:55         ` Rainer Orth
2016-08-11 15:39           ` Janne Blomqvist
2016-08-12 10:07         ` Thomas Schwinge
2016-08-12 10:37           ` Janne Blomqvist

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).