public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
@ 2022-07-27 12:06 Adhemerval Zanella
  2022-07-27 12:25 ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella @ 2022-07-27 12:06 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer, Jason A . Donenfeld, Carlos O'Donell

With new arc4random implementation, the internal parameters might
require a lot of runtime and/or trigger some contention on older
kernels (which might trigger spurious timeout failures).

Also, since we are now testing getrandom entropy instead of an
userspace RNG, there is no much need to extensive testing.

With this change the tst-arc4random-thread goes from about 1m to
5s on a Ryzen 9 with 5.15.0-41-generic.

Checked on x86_64-linux-gnu.
---
 stdlib/tst-arc4random-thread.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/stdlib/tst-arc4random-thread.c b/stdlib/tst-arc4random-thread.c
index 3373d4d446..9d25f3526b 100644
--- a/stdlib/tst-arc4random-thread.c
+++ b/stdlib/tst-arc4random-thread.c
@@ -24,16 +24,19 @@
 #include <support/namespace.h>
 #include <support/support.h>
 #include <support/xthread.h>
+#include <unistd.h>
 
 /* Number of arc4random_buf calls per thread.  */
-enum { count_per_thread = 5000 };
+enum { count_per_thread = 2048 };
+
+/* Number of threads in subprocess.  */
+enum { fork_threads = 3 };
 
 /* Number of threads computing randomness.  */
-enum { inner_threads = 5 };
+enum { inner_threads = 4 };
 
-/* Number of threads launching other threads.  Chosen as to not to
-   overload the system.  */
-enum { outer_threads = 7 };
+/* Number of threads launching other threads.  */
+static int outer_threads = 1;
 
 /* Number of launching rounds performed by the outer threads.  */
 enum { outer_rounds = 10 };
@@ -331,6 +334,15 @@ do_test_func (const char *fname, void (*func)(unsigned char *, size_t))
 static int
 do_test (void)
 {
+  /* Do not run more threads than the maximum of online CPUs.  */
+  long int cpus = sysconf (_SC_NPROCESSORS_ONLN);
+  if (cpus != -1)
+    /* Limit the number to not overload the system.  */
+    outer_threads = (cpus / 2) / inner_threads;
+
+  printf ("info: outer_threads=%d inner_threads=%d\n", outer_threads,
+	  inner_threads);
+
   do_test_func ("arc4random", generate_arc4random);
   do_test_func ("arc4random_buf", generate_arc4random_buf);
   do_test_func ("arc4random_uniform", generate_arc4random_uniform);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 12:06 [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters Adhemerval Zanella
@ 2022-07-27 12:25 ` Florian Weimer
  2022-07-27 12:29   ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-07-27 12:25 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell

* Adhemerval Zanella:

>  static int
>  do_test (void)
>  {
> +  /* Do not run more threads than the maximum of online CPUs.  */
> +  long int cpus = sysconf (_SC_NPROCESSORS_ONLN);
> +  if (cpus != -1)
> +    /* Limit the number to not overload the system.  */
> +    outer_threads = (cpus / 2) / inner_threads;

Doesn't this make outer_threads on many systems?

By the way, I think we should switch to the standard arc4random_uniform
implementation that doesn't try to conserve bits.  It's cheaper to get a
larger number of bits once instead of obtaining 24 bits first, then 8
bits.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 12:25 ` Florian Weimer
@ 2022-07-27 12:29   ` Adhemerval Zanella Netto
  2022-07-27 12:42     ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-27 12:29 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell



On 27/07/22 09:25, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>  static int
>>  do_test (void)
>>  {
>> +  /* Do not run more threads than the maximum of online CPUs.  */
>> +  long int cpus = sysconf (_SC_NPROCESSORS_ONLN);
>> +  if (cpus != -1)
>> +    /* Limit the number to not overload the system.  */
>> +    outer_threads = (cpus / 2) / inner_threads;
> 
> Doesn't this make outer_threads on many systems?

I did not quite follow, make outer_threads large do you mean?

> 
> By the way, I think we should switch to the standard arc4random_uniform
> implementation that doesn't try to conserve bits.  It's cheaper to get a
> larger number of bits once instead of obtaining 24 bits first, then 8
> bits.

Yeah, it should simpler indeed.  But I do not consider this urgent
for 2.36.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 12:29   ` Adhemerval Zanella Netto
@ 2022-07-27 12:42     ` Florian Weimer
  2022-07-27 13:00       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-07-27 12:42 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell

* Adhemerval Zanella Netto:

> On 27/07/22 09:25, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>>  static int
>>>  do_test (void)
>>>  {
>>> +  /* Do not run more threads than the maximum of online CPUs.  */
>>> +  long int cpus = sysconf (_SC_NPROCESSORS_ONLN);
>>> +  if (cpus != -1)
>>> +    /* Limit the number to not overload the system.  */
>>> +    outer_threads = (cpus / 2) / inner_threads;
>> 
>> Doesn't this make outer_threads on many systems?
>
> I did not quite follow, make outer_threads large do you mean?

Sorry, make outer_threads *zero*.

>> By the way, I think we should switch to the standard arc4random_uniform
>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>> larger number of bits once instead of obtaining 24 bits first, then 8
>> bits.
>
> Yeah, it should simpler indeed.  But I do not consider this urgent
> for 2.36.

Well, the standard way probably has more obvious statistical properties
and is harder to screw up. 8-/

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 12:42     ` Florian Weimer
@ 2022-07-27 13:00       ` Adhemerval Zanella Netto
  2022-07-27 15:36         ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-27 13:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell



On 27/07/22 09:42, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 27/07/22 09:25, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>>  static int
>>>>  do_test (void)
>>>>  {
>>>> +  /* Do not run more threads than the maximum of online CPUs.  */
>>>> +  long int cpus = sysconf (_SC_NPROCESSORS_ONLN);
>>>> +  if (cpus != -1)
>>>> +    /* Limit the number to not overload the system.  */
>>>> +    outer_threads = (cpus / 2) / inner_threads;
>>>
>>> Doesn't this make outer_threads on many systems?
>>
>> I did not quite follow, make outer_threads large do you mean?
> 
> Sorry, make outer_threads *zero*.

Yeah, it should be "(cpus / 2) / inner_threads ?: 1". I will send
an update.

> 
>>> By the way, I think we should switch to the standard arc4random_uniform
>>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>>> larger number of bits once instead of obtaining 24 bits first, then 8
>>> bits.
>>
>> Yeah, it should simpler indeed.  But I do not consider this urgent
>> for 2.36.
> 
> Well, the standard way probably has more obvious statistical properties
> and is harder to screw up. 8-/

Right, do you consider this a blocker? I think I can send a patch to 
use a simpler algorithm.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 13:00       ` Adhemerval Zanella Netto
@ 2022-07-27 15:36         ` Florian Weimer
  2022-07-27 15:57           ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2022-07-27 15:36 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell

* Adhemerval Zanella Netto:

>>>> By the way, I think we should switch to the standard arc4random_uniform
>>>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>>>> larger number of bits once instead of obtaining 24 bits first, then 8
>>>> bits.
>>>
>>> Yeah, it should simpler indeed.  But I do not consider this urgent
>>> for 2.36.
>> 
>> Well, the standard way probably has more obvious statistical properties
>> and is harder to screw up. 8-/
>
> Right, do you consider this a blocker? I think I can send a patch to 
> use a simpler algorithm.

I like it because it would minimize risk.  It's not a strict blocker.
I'm not sure if I'll be able to work on it this week.  Maybe I can
review a patch.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 15:36         ` Florian Weimer
@ 2022-07-27 15:57           ` Adhemerval Zanella Netto
  2022-07-27 17:10             ` Yann Droneaud
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-27 15:57 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Jason A . Donenfeld, Carlos O'Donell



On 27/07/22 12:36, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>>>>> By the way, I think we should switch to the standard arc4random_uniform
>>>>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>>>>> larger number of bits once instead of obtaining 24 bits first, then 8
>>>>> bits.
>>>>
>>>> Yeah, it should simpler indeed.  But I do not consider this urgent
>>>> for 2.36.
>>>
>>> Well, the standard way probably has more obvious statistical properties
>>> and is harder to screw up. 8-/
>>
>> Right, do you consider this a blocker? I think I can send a patch to 
>> use a simpler algorithm.
> 
> I like it because it would minimize risk.  It's not a strict blocker.
> I'm not sure if I'll be able to work on it this week.  Maybe I can
> review a patch.

Without trying to be clever, what about something like the below, adapted
from Bitmask with Rejection [1].  It should be fast on most case since it
avoid the modulo operation, and the last part that tries to reuse the
extra entropy bits are not strictly required.


uint32_t
__arc4random_uniform (uint32_t n)
{
  if (n <= 1)
    /* There is no valid return value for a zero limit, and 0 is the
       only possible result for limit 1.  */
    return 0;

  /* mask is the smallest power of 2 minus 1 which is larger than n.  */
  int z = __builtin_clz (n);
  int bits = CHAR_BIT * sizeof (uint32_t) - z;
  uint32_t mask = ~UINT32_C(0) >> z;

  while (1)
    {
      uint32_t value = __arc4random ();

      /* Return is the lower power of 2 minus 1 satisfy the condition.  */
      uint32_t r = value & mask;
      if (r < n)
        return r;

      /* Otherwise check if remaining bits of entropy provides fits in the
         bound.  */
      int bits_left = z;
      while (bits_left >= bits)
        {
          value >>= bits;
          r = value & mask;
          if (r < n)
            return r;
          bits_left -= bits;
        }
    }
}

[1] https://www.pcg-random.org/posts/bounded-rands.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 15:57           ` Adhemerval Zanella Netto
@ 2022-07-27 17:10             ` Yann Droneaud
  2022-07-27 19:18               ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Droneaud @ 2022-07-27 17:10 UTC (permalink / raw)
  To: libc-alpha

Hi,

Le 27/07/2022 à 17:57, Adhemerval Zanella Netto via Libc-alpha a écrit :
>
> On 27/07/22 12:36, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>>
>>>>>> By the way, I think we should switch to the standard arc4random_uniform
>>>>>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>>>>>> larger number of bits once instead of obtaining 24 bits first, then 8
>>>>>> bits.
>>>>> Yeah, it should simpler indeed.  But I do not consider this urgent
>>>>> for 2.36.
>>>> Well, the standard way probably has more obvious statistical properties
>>>> and is harder to screw up. 8-/
>>> Right, do you consider this a blocker? I think I can send a patch to
>>> use a simpler algorithm.
>> I like it because it would minimize risk.  It's not a strict blocker.
>> I'm not sure if I'll be able to work on it this week.  Maybe I can
>> review a patch.
> Without trying to be clever, what about something like the below, adapted
> from Bitmask with Rejection [1].  It should be fast on most case since it
> avoid the modulo operation, and the last part that tries to reuse the
> extra entropy bits are not strictly required.
>
>
> uint32_t
> __arc4random_uniform (uint32_t n)
> {
>    if (n <= 1)
>      /* There is no valid return value for a zero limit, and 0 is the
>         only possible result for limit 1.  */
>      return 0;
>
>    /* mask is the smallest power of 2 minus 1 which is larger than n.  */
>    int z = __builtin_clz (n);


For the special case where n is power of two, I think it should be 
__builtin_clz (n - 1).

For example n = 8, arc4random_uniform() returns a value up to 7, but 
mask would be 0x1111, while 0x111 would have been enough.


>    int bits = CHAR_BIT * sizeof (uint32_t) - z;
>    uint32_t mask = ~UINT32_C(0) >> z;
>
>    while (1)
>      {
>        uint32_t value = __arc4random ();
>
>        /* Return is the lower power of 2 minus 1 satisfy the condition.  */
>        uint32_t r = value & mask;
>        if (r < n)
>          return r;
>
>        /* Otherwise check if remaining bits of entropy provides fits in the
>           bound.  */
>        int bits_left = z;
>        while (bits_left >= bits)
>          {
>            value >>= bits;
>            r = value & mask;
>            if (r < n)
>              return r;
>            bits_left -= bits;
>          }
>      }
> }
>
> [1] https://www.pcg-random.org/posts/bounded-rands.html

Thanks

-- 

Yann Droneaud

OPTEYA



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters
  2022-07-27 17:10             ` Yann Droneaud
@ 2022-07-27 19:18               ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 9+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-27 19:18 UTC (permalink / raw)
  To: libc-alpha



On 27/07/22 14:10, Yann Droneaud wrote:
> Hi,
> 
> Le 27/07/2022 à 17:57, Adhemerval Zanella Netto via Libc-alpha a écrit :
>>
>> On 27/07/22 12:36, Florian Weimer wrote:
>>> * Adhemerval Zanella Netto:
>>>
>>>>>>> By the way, I think we should switch to the standard arc4random_uniform
>>>>>>> implementation that doesn't try to conserve bits.  It's cheaper to get a
>>>>>>> larger number of bits once instead of obtaining 24 bits first, then 8
>>>>>>> bits.
>>>>>> Yeah, it should simpler indeed.  But I do not consider this urgent
>>>>>> for 2.36.
>>>>> Well, the standard way probably has more obvious statistical properties
>>>>> and is harder to screw up. 8-/
>>>> Right, do you consider this a blocker? I think I can send a patch to
>>>> use a simpler algorithm.
>>> I like it because it would minimize risk.  It's not a strict blocker.
>>> I'm not sure if I'll be able to work on it this week.  Maybe I can
>>> review a patch.
>> Without trying to be clever, what about something like the below, adapted
>> from Bitmask with Rejection [1].  It should be fast on most case since it
>> avoid the modulo operation, and the last part that tries to reuse the
>> extra entropy bits are not strictly required.
>>
>>
>> uint32_t
>> __arc4random_uniform (uint32_t n)
>> {
>>    if (n <= 1)
>>      /* There is no valid return value for a zero limit, and 0 is the
>>         only possible result for limit 1.  */
>>      return 0;
>>
>>    /* mask is the smallest power of 2 minus 1 which is larger than n.  */
>>    int z = __builtin_clz (n);
> 
> 
> For the special case where n is power of two, I think it should be __builtin_clz (n - 1).
> 
> For example n = 8, arc4random_uniform() returns a value up to 7, but mask would be 0x1111, while 0x111 would have been enough.

Yeah, it makes sense. I will send a patch to use this method.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-27 19:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27 12:06 [PATCH] stdlib: Tuned down tst-arc4random-thread internal parameters Adhemerval Zanella
2022-07-27 12:25 ` Florian Weimer
2022-07-27 12:29   ` Adhemerval Zanella Netto
2022-07-27 12:42     ` Florian Weimer
2022-07-27 13:00       ` Adhemerval Zanella Netto
2022-07-27 15:36         ` Florian Weimer
2022-07-27 15:57           ` Adhemerval Zanella Netto
2022-07-27 17:10             ` Yann Droneaud
2022-07-27 19:18               ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).