Re: [PATCH 2/2] Add single-threaded fast path to rand()

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Cristian Rodríguez" <cristian@rodriguez.im>,
	"Florian Weimer" <fweimer@redhat.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	GNU C Library <libc-alpha@sourceware.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>
Subject: Re: [PATCH 2/2] Add single-threaded fast path to rand()
Date: Thu, 21 Mar 2024 10:35:57 -0400	[thread overview]
Message-ID: <8cf180c9-f7e9-4a32-86a4-a5780131ec87@efficios.com> (raw)
In-Reply-To: <CAPBLoAeM4gv2cdh33fVccW7CJUWwPvAj1-4jmg8KXzR7vjMK0Q@mail.gmail.com>

On 2024-03-21 09:33, Cristian Rodríguez wrote:
> On Thu, Mar 21, 2024 at 4:39 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Adhemerval Zanella Netto:
>>
>>> On 20/03/24 11:18, Cristian Rodríguez wrote:
>>>> On Wed, Mar 20, 2024 at 9:31 AM Adhemerval Zanella Netto
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>> I don't have a strong opinion on this rand patch, if this idea is to
>>>>> have it as an workbench for a possible single-thread lock optimization
>>>>> it should be fine.  It is just that I don't see much gain in optimizing
>>>>> such a bad interface (although we still lack a proper userland PRNG).
>>>>
>>>> Yeah, it should be no surprise this interfaces are bad,
>>>> I thought this was common knowledge.
>>>>
>>>> we need something like https://github.com/C2SP/C2SP/blob/main/chacha8rand.md
>>>> which prety much outperforms even non-CS algorithms in at least 64 bit x86.
>>>> but the question of the state remains.global? TLS? how to discard it
>>>> in all the appropriate occasions?
>>>
>>> And this is the arc4random in userspace discussion all over again.
>>
>> Agreed.  But what has changed that we know now that Linux won't provide
>> us with vDSO acceleration for arc4random.  So I think it wouldn't be
>> unreasonable to roll our own.  Right now, the switch to arc4random
>> provided by glibc is a massive performance regression compared to other
>> implementations.
> 
> 
> Afaik the other path that hasn't been tried is rseq no ? could the
> kernel provide a random state this way that it is cleared on the right
> conditions..

Letting rseq know about userspace random state would add a lot of coupling
between kernel and userspace. I wonder if we can achieve the speed up you
are looking for using existing rseq/membarrier features.

At the last GNU Cauldron, I created a biased locking prototype for Florian
based on rseq:

https://github.com/compudj/librseq/tree/biased-lock
https://github.com/compudj/librseq/blob/biased-lock/tests/rseq_biased_lock.h
https://github.com/compudj/librseq/blob/biased-lock/tests/rseq_biased_lock.c
https://github.com/compudj/librseq/blob/biased-lock/tests/test_rseq_biased_lock.c

The original purpose of this prototype was to replace the FILE lock
by a biased lock, which acts as a very fast lock (a small rseq assembly
sequence of loads, comparisons, stores) as long as the thread accessing the
FILE is the thread owner thread whom created it.

If another thread attempts to access the FILE, then the internal state machine
of the biased lock transitions from RSEQ_BIASED_LOCK_STATE_ST to
RSEQ_BIASED_LOCK_STATE_MT_STARTED, issues a membarrier RSEQ_PRIVATE_EXPEDITED
(also referred to as "rseq fence") to abort all pre-existing rseq critical
sections, and then transition to RSEQ_BIASED_LOCK_STATE_MT_READY before taking
the lock.

After the biased lock has transitioned to RSEQ_BIASED_LOCK_STATE_MT_READY
it is meant to stay in that state forever. So as long as only the owner
thread uses the lock, it stays fast. This means that even a multi-threaded
process with short-lived FILE that are only used by a single thread in their
lifetime will benefit from the biased lock fast-path.

I suspect we could adapt this scheme to the srand()/rand() lock. Here is how
I see this unfold for both srand() and rand():

   - On first use (if rseq biased lock owner is NULL):
     - use rseq_biased_lock_try_set_fast_thread to set lock owner with a
       compare-and-swap (expecting NULL).
   - Take the rseq biased lock:
     - This rseq biased lock will take care of transitioning to a scheme
       which is MT-aware if it detects that the thread trying to access
       the biased lock is not the owner.

So this scheme applied to both srand() and rand() should take care of making
things really fast as long as only the biased lock owner thread uses
srand()/rand().

Its overhead would be a single system call (membarrier) issued the first
time the biased lock is accessed from a thread which differs from the
first thread which used the lock.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

next prev parent reply	other threads:[~2024-03-21 14:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 15:20 Wilco Dijkstra
2024-03-18 19:24 ` Cristian Rodríguez
2024-03-19 15:44   ` Wilco Dijkstra
2024-03-20 12:31     ` Adhemerval Zanella Netto
2024-03-20 14:18       ` Cristian Rodríguez
2024-03-20 14:27         ` Adhemerval Zanella Netto
2024-03-21  7:39           ` Florian Weimer
2024-03-21 13:33             ` Cristian Rodríguez
2024-03-21 14:35               ` Mathieu Desnoyers [this message]
2024-03-21 15:07                 ` Szabolcs Nagy
2024-03-21 15:18                   ` Wilco Dijkstra
2024-03-21 15:53               ` Adhemerval Zanella Netto
2024-03-22 14:27                 ` Zack Weinberg
2024-03-22 14:46                   ` Adhemerval Zanella Netto
2024-03-22 15:30                     ` Zack Weinberg
2024-03-22 18:05                       ` Adhemerval Zanella Netto
2024-03-22 19:47                         ` Mathieu Desnoyers
2024-03-22 22:54                           ` Cristian Rodríguez
2024-03-23 14:01                           ` Zack Weinberg
2024-03-23 15:23                             ` Mathieu Desnoyers
2024-03-25 14:09                               ` Mathieu Desnoyers
2024-03-25 17:52                                 ` Adhemerval Zanella Netto
2024-03-24  0:59                             ` Cristian Rodríguez
2024-03-25  6:44                             ` Florian Weimer
2024-03-20 14:28         ` Wilco Dijkstra
2024-03-20 14:40           ` Xi Ruoyao
  -- strict thread matches above, loose matches on Subject: below --
2023-11-28 17:37 Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8cf180c9-f7e9-4a32-86a4-a5780131ec87@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Jason@zx2c4.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=cristian@rodriguez.im \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).