arc4random - are you sure we want these?

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* arc4random - are you sure we want these?
       [not found] <YtwgTySJyky0OcgG@zx2c4.com>
@ 2022-07-23 16:25 ` Jason A. Donenfeld
  2022-07-23 17:18   ` Paul Eggert
                     ` (4 more replies)
  0 siblings, 5 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-23 16:25 UTC (permalink / raw)
  To: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Yann Droneaud, jann, Michael

[Resending to right address.]

Hi glibc developers,

I learned about the addition of the arc4random functions in glibc this
morning, thanks to Phoronix. I wish somebody would have CC'd me into
those discussions before it got committed, but here we are.

I really wonder whether this is a good idea, whether this is something
that glibc wants, and whether it's a design worth committing to in the
long term.

Firstly, for what use cases does this actually help? As of recent
changes to the Linux kernels -- now backported all the way to 4.9! --
getrandom() and /dev/urandom are extremely fast and operate over per-cpu
states locklessly. Sure you avoid a syscall by doing that in userspace,
but does it really matter? Who exactly benefits from this?

Seen that way, it seems like a lot of complexity for nothing, and
complexity that will lead to bugs and various oversights eventually.

For example, the kernel reseeds itself when virtual machines fork using
an identifier passed to the kernel via ACPI. It also reseeds itself on
system resume, both from ordinary S3 sleep but also, more importantly,
from hibernation. And in general, being the arbiter of entropy, the
kernel is much better poised to determine when it makes sense to reseed.

Glibc, on the other hand, can employ some heuristics and make some
decisions -- on fork, after 16 MiB, and the like -- but in general these
are lacking, compared to the much wider array of information the kernel
has.

You miss out on this with arc4random, and if that information _is_ to be
exported to userspace somehow in the future, it would be awfully nice to
design the userspace interface alongside the kernel one.

For that reason, past discussion of having some random number generation
in userspace libcs has geared toward doing this in the vDSO, somehow,
where the kernel can be part and parcel of that effort.

Seen from this perspective, going with OpenBSD's older paradigm might be
rather limiting. Why not work together, between the kernel and libc, to
see if we can come up with something better, before settling on an
interface with semantics that are hard to walk back later?

As-is, it's hard to recommend that anybody really use these functions.
Just keep using getrandom(2), which has mostly favorable semantics.

Yes, I get it: it's fun to make a random number generator, and so lots
of projects figure out some way to make yet another one somewhere
somehow. But the tendency to do so feels like a weird computer tinkerer
disease rather something that has ever helped the overall ecosystem.

So I'm wondering: who actually needs this, and why? What's the
performance requirement like, and why is getrandom(2) insufficient? And
is this really the best approach to take? If this is something needed,
how would you feel about working together on a vDSO approach instead? Or
maybe nobody actually needs this in the first place?

And secondly, is there anyway that glibc can *not* do this, or has that
ship fully sailed, and I really missed out by not being part of that
discussion whenever it was happening?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
@ 2022-07-23 17:18   ` Paul Eggert
  2022-07-24 23:55     ` Jason A. Donenfeld
  2022-07-23 17:39   ` Adhemerval Zanella Netto
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2022-07-23 17:18 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: libc-alpha

On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote:
> it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.

Yes, that's what I plan to do in GNU projects like Coreutils and Emacs.

Although I don't recommend arc4random, I suppose it was added for 
source-code compatibility with the BSDs (I wasn't involved in the decision).

> is there anyway that glibc can *not*  do this, or has that
> ship fully sailed

It hasn't fully sailed since we haven't done a release.

> it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow.

That's a bit harsh. Coreutils still has its own random number generator 
because it needed to be portable to a bunch of platforms and there was 
no standard. Eventually we'll rip it out but there's no rush. Having 
written much of that code I can reliably assert that it was not fun.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 17:18   ` Paul Eggert
@ 2022-07-24 23:55     ` Jason A. Donenfeld
  2022-07-25 20:31       ` Paul Eggert
  0 siblings, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-24 23:55 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha, linux-crypto

Hi Paul,

Sorry I missed your reply earlier. I'm not a subscriber so I missed this
as I somehow fell out of the CC.

On Sat, Jul 23, 2022 at 05:18:05PM +0000, Paul Eggert wrote:
> On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote:
> > it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
> 
> Yes, that's what I plan to do in GNU projects like Coreutils and Emacs.
> 
> Although I don't recommend arc4random, I suppose it was added for 
> source-code compatibility with the BSDs (I wasn't involved in the decision).

Source code compatibility isn't exactly a bad goal. But according to
Adhemerval you don't plan on this being a secure thing -- hence
mentioning as such in the documentation as he mentioned -- so it seems
like a maybe-okay goal gone bad. But, anyway, if the goal is just basic
source code compatibility, back it with simple calls to getrandom() to
start, and if later there are performance issues (big if!), we can look
into vDSO tricks and such to speed that up. There's no need to add a
whole new huge fraught mechanism for that.

> > is there anyway that glibc can *not*  do this, or has that
> > ship fully sailed
> 
> It hasn't fully sailed since we haven't done a release.

Well that's good. I'd recommend just backing it out until it can be done
in a way that glibc developers feel comfortable calling safe (and others
too, of course, but at the very least you don't want to start out making
something you feel the need to warn about in the documentation).

> That's a bit harsh. Coreutils still has its own random number generator 
> because it needed to be portable to a bunch of platforms and there was 
> no standard. Eventually we'll rip it out but there's no rush. Having 
> written much of that code I can reliably assert that it was not fun.

I'm happy to help with this if you need. I recently cleaned up some
stuff similar sounding in systemd for their uses; random-util.c there
might be of interest.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-24 23:55     ` Jason A. Donenfeld
@ 2022-07-25 20:31       ` Paul Eggert
  0 siblings, 0 replies; 81+ messages in thread
From: Paul Eggert @ 2022-07-25 20:31 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: libc-alpha, linux-crypto

On 7/24/22 16:55, Jason A. Donenfeld wrote:

> Sorry I missed your reply earlier. I'm not a subscriber so I missed this
> as I somehow fell out of the CC.

Your email provider (Google) rejected email from cs.ucla.edu on the 
grounds that its IP address 131.179.128.68 has a "very low reputation". 
Google provided no way to appeal or fix the problem.

I am using "Reply All" for this message because Google likely won't 
deliver it to you directly. Perhaps someone else can forward it to you 
for me. (Sorry to bother the list.)

Perhaps this is a subtle way to encourage our department's faculty to 
let Google manage our email. We've resisted so far, though.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
  2022-07-23 17:18   ` Paul Eggert
@ 2022-07-23 17:39   ` Adhemerval Zanella Netto
  2022-07-23 22:54     ` Jason A. Donenfeld
  2022-07-25 15:33     ` Rich Felker
  2022-07-23 19:04   ` Cristian Rodríguez
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-23 17:39 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud,
	jann, Michael, Paul Eggert

On 23/07/22 13:25, Jason A. Donenfeld wrote:
> [Resending to right address.]
> 
> Hi glibc developers,
> 
> I learned about the addition of the arc4random functions in glibc this
> morning, thanks to Phoronix. I wish somebody would have CC'd me into
> those discussions before it got committed, but here we are.

Florian has sent the initial version about four years ago in on libc
alpha (libc-alpha@sourceware.org). This is the maillist used for glibc
development, RFC, and general discussions.

> 
> I really wonder whether this is a good idea, whether this is something
> that glibc wants, and whether it's a design worth committing to in the
> long term.

I think so, this is something developers have been asking us since 
2007 [1] and used and ported on multiples OS (OpenBSD, FreeBSD, MAcOSX).

> 
> Firstly, for what use cases does this actually help? As of recent
> changes to the Linux kernels -- now backported all the way to 4.9! --
> getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> states locklessly. Sure you avoid a syscall by doing that in userspace,
> but does it really matter? Who exactly benefits from this?

Mainly performance, since glibc both export getrandom and getentropy. 
There were some discussion on maillist and we also decided to explicit
state this is not a CSRNG on our documentation.

> 
> Seen that way, it seems like a lot of complexity for nothing, and
> complexity that will lead to bugs and various oversights eventually.
> 
> For example, the kernel reseeds itself when virtual machines fork using
> an identifier passed to the kernel via ACPI. It also reseeds itself on
> system resume, both from ordinary S3 sleep but also, more importantly,
> from hibernation. And in general, being the arbiter of entropy, the
> kernel is much better poised to determine when it makes sense to reseed.
> 
> Glibc, on the other hand, can employ some heuristics and make some
> decisions -- on fork, after 16 MiB, and the like -- but in general these
> are lacking, compared to the much wider array of information the kernel
> has.
> 
> You miss out on this with arc4random, and if that information _is_ to be
> exported to userspace somehow in the future, it would be awfully nice to
> design the userspace interface alongside the kernel one.
> 
> For that reason, past discussion of having some random number generation
> in userspace libcs has geared toward doing this in the vDSO, somehow,
> where the kernel can be part and parcel of that effort.
> 
> Seen from this perspective, going with OpenBSD's older paradigm might be
> rather limiting. Why not work together, between the kernel and libc, to
> see if we can come up with something better, before settling on an
> interface with semantics that are hard to walk back later?

Mainly because there are some programs out there that can still benefit
from a wide-spread interface instead of relying on a not yet implemented
interface that will be only available in a future kernel.  But at same
time there nothing prevents us to either use the vDSO-like interface or
improve our implementation with better heuristics or even use a different
cipher algorithm.

There are even some discussion on making arc4random fallback to getrandom
if a tunable or if kernel is set on some strict manner.

> 
> As-is, it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.
> 
> Yes, I get it: it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow. But the tendency to do so feels like a weird computer tinkerer
> disease rather something that has ever helped the overall ecosystem.

I did not added because it was 'fun' not I was trying to be clever here,
my initial plan was to use a de-facto implementation based on OpenBSD 
exactly to avoid  the pitfalls on trying to come up a new RNG scheme.

> 
> So I'm wondering: who actually needs this, and why? What's the
> performance requirement like, and why is getrandom(2) insufficient? And
> is this really the best approach to take? If this is something needed,
> how would you feel about working together on a vDSO approach instead? Or
> maybe nobody actually needs this in the first place?

The vDSO approach would be good think and if even the kernel provides it
I think it would feasible to wire-up arc4random to use it if the  underlying
kernel supports it.  The OpenBSD, for instance, has a feature to instruct 
kernel provide direct random data to ELF segment [4]; and they use it to 
seed various libc hardening features (way more versatile than AT_RANDOM
and more fail proff than getrandom, as we saw on some environment where).

> 
> And secondly, is there anyway that glibc can *not* do this, or has that
> ship fully sailed, and I really missed out by not being part of that
> discussion whenever it was happening?

Well, we are in fact discussing adding arc4random since Florian initial
proposal [2], roughly 4 years ago; and the initial bug report asking is
from 15 years ago.

I still think it is a good addition to provide arc4random for the same
reason we are adding proposing using strlcpy [3]: developers still use
such interface, being source-code compatibility with the BSDs might 
help developer to avoid rollout their out implementation (even if some
developers do agree that are not the best interface), and focusing on
one implementation might improve the general ecosystem.  As Paul noted,
coreutils has its own RNG, while having a arc4random like interface
might free it to so (at least on glibc systems).

But in the end I think if we are clear about in on the documentation,
and provide alternative when the users are aware of the limitation, I do
not think it is bad decision.

> 
> Thanks,
> Jason

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=4417
[2] https://sourceware.org/pipermail/libc-alpha/2018-March/092081.html
[3] https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html
[4] https://github.com/openbsd/src/blob/master/libexec/ld.so/SPECS.randomdata

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 17:39   ` Adhemerval Zanella Netto
@ 2022-07-23 22:54     ` Jason A. Donenfeld
  2022-07-25 15:33     ` Rich Felker
  1 sibling, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-23 22:54 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Yann Droneaud, jann, Michael,
	Paul Eggert, linux-crypto

Hi Adhemerval,

Thanks for your reply.

On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto wrote:
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
> 
> Mainly performance, since glibc both export getrandom and getentropy. 

Okay so your motivation is performance. But can you tell me what your
performance goals actually are? All kernel.org stable kernels from 4.9
and upwards now have really fast per-cpu lockless implementations of
getrandom() and /dev/urandom. If your goal is performance, I would be
very, very interested to find out a circumstance where this is
insufficient.

> There were some discussion on maillist and we also decided to explicit
> state this is not a CSRNG on our documentation.

Okay that's all the more reason why this is a completely garbage
endeavor. Sorry for the strong language, but the last thing anybody
needs is another PRNG that's "half way" between being good for crypto
and not. If it's not good for crypto, people will use it anyway,
especially since you're winking at them saying, "oh but actually
chacha20 is fine technically so....", and then fast-forward a few years
when you realize you can lean on your non-crypto commitment and make
things different. Never underestimate the power of a poorly defined
function definition. If your goal isn't to make a real CSPRNG, why make
this kind of thing at all?

And it's especially ridiculous since the OpenBSD arc4random *is* used
for crypto. So now you've really muddied the waters. (And naturally the
OpenBSD arc4random was done in conjunction with their kernel
development, since the same people work on both, which isn't what's
happened here.)

So your "it's a CSPRNG wink wink but the documentation says not, so
actually we're off the hook for doing this well" is a cop-out that will
lead to trouble.

Going back to my original point: what are the performance requirements
that point toward a userspace RNG being required here? If it's not
actually necessary, then let's not do this. If it is necessary for some
legitimate widespread reason, then let's do this right, and actually
make something you're comfortable calling cryptographically secure. And
let's get this right from the beginning, so that the new interface
doesn't come with all sorts of caveats, "this is safe for glibc ≥ 
4.3.2.1 only", or whatever else.

Again, I'm not adverse to the general concept. I just haven't seen
anything really justifying adding the complexity for it. And then
assuming that justification does exist somewhere, this approach doesn't
seem to be a particularly well planned one. As soon as you find yourself
reaching for the "documentation cop-out", something has gone amiss.

> The vDSO approach would be good think and if even the kernel provides it
> I think it would feasible to wire-up arc4random to use it if the  underlying
> kernel supports it.

So if you justify the performance requirement, wouldn't it make more
sense to just back getrandom() itself with a vDSO call? So that way,
kernels with that get bits faster (but by how much, really? c'mon...),
and kernels without it have things as normal as possible.

If your concern is instances in which getrandom() can fail, I'd like to
here what those concerns are so that interface can be fixed and
improved.

> But in the end I think if we are clear about in on the documentation,
> and provide alternative when the users are aware of the limitation, I do
> not think it is bad decision.

This really strikes me as an almost comically ominous expectation.
Design interfaces that don't have dangerous pitfalls. While
documentation might somehow technically absolve you of responsibility,
it doesn't actually help make the ecosystem safer by providing optimal
interfaces that don't have cop outs.

Anyway, to reiterate:

- Can you show me some concerning performance numbers on the current
  batch of kernel.org stable kernels, and the use cases for which those
  numbers are concerning, and how widespread you think those use cases
  are?

- If this really *is* necessary for some reason, can we do it well out
  of the gate, with good coordination between kernel and userland,
  instead of half-assing it initially and covering that up with a
  documentation note?

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 17:39   ` Adhemerval Zanella Netto
  2022-07-23 22:54     ` Jason A. Donenfeld
@ 2022-07-25 15:33     ` Rich Felker
  2022-07-25 15:59       ` Adhemerval Zanella Netto
                         ` (2 more replies)
  1 sibling, 3 replies; 81+ messages in thread
From: Rich Felker @ 2022-07-25 15:33 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud,
	jann, Michael, Paul Eggert

On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> On 23/07/22 13:25, Jason A. Donenfeld wrote:
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
> 
> Mainly performance, since glibc both export getrandom and getentropy. 
> There were some discussion on maillist and we also decided to explicit
> state this is not a CSRNG on our documentation.

This is an extreme documentation/specification bug that *hurts*
portability and security. The core contract of the historical
arc4random function is that it *is* a CSPRNG. Having a function by
that name that's allowed not to be one means now all software using it
has to add detection for the broken glibc variant.

If the glibc implementation has flaws that actually make it not a
CSPRNG, this absolutely needs to be fixed. Not doing so is
irresponsible and will set everyone back a long ways.

If this is just a case of trying to be "cautious" about overpromising
things, the documentation needs fixed to specify that this is a
CSPRNG. I'm particularly worried about the wording "these still use a
Pseudo-Random generator and should not be used in cryptographic
contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make
it not cryptographically safe. The safety depends on the original
source of the entropy and the practical irreversibility and other
cryptographic properties of the extension function. The fact that this
has been stated so poorly in the documentation really has me worried
that someone does not understand the issues. I haven't dug into the
list mails or actual code to determine to what extent that's the case,
but it's really, *really* worrying.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 15:33     ` Rich Felker
@ 2022-07-25 15:59       ` Adhemerval Zanella Netto
  2022-07-25 17:41         ` Rich Felker
  2022-07-25 16:18       ` Sandy Harris
  2022-07-25 16:40       ` Florian Weimer
  2 siblings, 1 reply; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-25 15:59 UTC (permalink / raw)
  To: Rich Felker
  Cc: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud,
	jann, Michael, Paul Eggert



On 25/07/22 12:33, Rich Felker wrote:
> 
> If this is just a case of trying to be "cautious" about overpromising
> things, the documentation needs fixed to specify that this is a
> CSPRNG. I'm particularly worried about the wording "these still use a
> Pseudo-Random generator and should not be used in cryptographic
> contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make
> it not cryptographically safe. The safety depends on the original
> source of the entropy and the practical irreversibility and other
> cryptographic properties of the extension function. The fact that this
> has been stated so poorly in the documentation really has me worried
> that someone does not understand the issues. I haven't dug into the
> list mails or actual code to determine to what extent that's the case,
> but it's really, *really* worrying.

That's the main drive to avoid calling CSPRNGs, since nor me or Florian
is secure enough to certify current scheme can actually follow all the
requirements.  It does follow OpenBSD strategy of a fast-key-erasure 
random-number generators, although all strategies of key reseeding are
basically heuristics.

If I understand Jason argument correctly, unless we have a kernel API
which it actually handles the buffer (so it can reseed or clear when it
seems fit), there is no point is proving a CSPRNGs in userspace, use
getrandom instead.



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 15:59       ` Adhemerval Zanella Netto
@ 2022-07-25 17:41         ` Rich Felker
  0 siblings, 0 replies; 81+ messages in thread
From: Rich Felker @ 2022-07-25 17:41 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	Michael, jann

On Mon, Jul 25, 2022 at 12:59:39PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> 
> 
> On 25/07/22 12:33, Rich Felker wrote:
> > 
> > If this is just a case of trying to be "cautious" about overpromising
> > things, the documentation needs fixed to specify that this is a
> > CSPRNG. I'm particularly worried about the wording "these still use a
> > Pseudo-Random generator and should not be used in cryptographic
> > contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make
> > it not cryptographically safe. The safety depends on the original
> > source of the entropy and the practical irreversibility and other
> > cryptographic properties of the extension function. The fact that this
> > has been stated so poorly in the documentation really has me worried
> > that someone does not understand the issues. I haven't dug into the
> > list mails or actual code to determine to what extent that's the case,
> > but it's really, *really* worrying.
> 
> That's the main drive to avoid calling CSPRNGs, since nor me or Florian
> is secure enough to certify current scheme can actually follow all the
> requirements.  It does follow OpenBSD strategy of a fast-key-erasure 
> random-number generators, although all strategies of key reseeding are
> basically heuristics.

I think the core problem here is that, in making an implementation of
a widely agreed-upon historical function with an existing working
definition of what "cryptographically secure" means of a PRNG, you're
instead positing a possibly-different definition of "CS" and saying
"it might not be CS by this new definition". This does genuine harm to
understanding of an area developers and users already understand very
very poorly.

The documentation should state that it's cryptographically secure in
the sense normally meant for arc4random, which includes not falsely
returning with "success" at early boot (no GRND_INSECURE or AT_RANDOM
fallback), but that this does not necessarily include any guarantees
about what happens in a program with undefined behavior ("hardening"
properties) or things like actively trying to prevent you from cloning
state (VM freeze/resume stuff, etc.)

> If I understand Jason argument correctly, unless we have a kernel API
> which it actually handles the buffer (so it can reseed or clear when it
> seems fit), there is no point is proving a CSPRNGs in userspace, use
> getrandom instead.

As for me, I am in favor of having the interface, and would be fine
with having it just wrap getentropy as an unlimited-length version
thereof. The value is in having a commonly agreed upon API with common
guarantees so as not to promote YOLO NIH of critical stuff like safe
fallbacks for entropy.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 15:33     ` Rich Felker
  2022-07-25 15:59       ` Adhemerval Zanella Netto
@ 2022-07-25 16:18       ` Sandy Harris
  2022-07-25 16:40       ` Florian Weimer
  2 siblings, 0 replies; 81+ messages in thread
From: Sandy Harris @ 2022-07-25 16:18 UTC (permalink / raw)
  To: Rich Felker
  Cc: Adhemerval Zanella Netto, Jason A. Donenfeld, libc-alpha,
	Florian Weimer, Yann Droneaud, Jann Horn, Michael, Paul Eggert,
	Linux Crypto Mailing List

Rich Felker <dalias@libc.org> wrote:

> This is an extreme documentation/specification bug that *hurts*
> portability and security. The core contract of the historical
> arc4random function is that it *is* a CSPRNG. Having a function by
> that name that's allowed not to be one means now all software using it
> has to add detection for the broken glibc variant.
>
> If the glibc implementation has flaws that actually make it not a
> CSPRNG, this absolutely needs to be fixed. Not doing so is
> irresponsible and will set everyone back a long ways.

Exactly!

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 15:33     ` Rich Felker
  2022-07-25 15:59       ` Adhemerval Zanella Netto
  2022-07-25 16:18       ` Sandy Harris
@ 2022-07-25 16:40       ` Florian Weimer
  2022-07-25 16:49         ` Adhemerval Zanella Netto
                           ` (2 more replies)
  2 siblings, 3 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-25 16:40 UTC (permalink / raw)
  To: Rich Felker
  Cc: Adhemerval Zanella Netto, Jason A. Donenfeld, libc-alpha,
	Yann Droneaud, jann, Michael, Paul Eggert

* Rich Felker:

> On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
>> On 23/07/22 13:25, Jason A. Donenfeld wrote:
>> > Firstly, for what use cases does this actually help? As of recent
>> > changes to the Linux kernels -- now backported all the way to 4.9! --
>> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>> > states locklessly. Sure you avoid a syscall by doing that in userspace,
>> > but does it really matter? Who exactly benefits from this?
>> 
>> Mainly performance, since glibc both export getrandom and getentropy. 
>> There were some discussion on maillist and we also decided to explicit
>> state this is not a CSRNG on our documentation.
>
> This is an extreme documentation/specification bug that *hurts*
> portability and security. The core contract of the historical
> arc4random function is that it *is* a CSPRNG. Having a function by
> that name that's allowed not to be one means now all software using it
> has to add detection for the broken glibc variant.
>
> If the glibc implementation has flaws that actually make it not a
> CSPRNG, this absolutely needs to be fixed. Not doing so is
> irresponsible and will set everyone back a long ways.

The core issue is that on some kernels/architectures, reading from
/dev/urandom can degrade to GRND_INSECURE (approximately), and while the
result is likely still unpredictable, not everyone would label that as a
CSPRNG.

If we document arc4random as a CSPRNG, this means that we would have to
ditch the fallback code and abort the process if the getrandom system
call is not available: when reading from /dev/urandom as a fallback, we
have no way of knowing if we are in any of the impacted execution
environments.  Based on your other comments, it seems that you are
interested in such fallbacks, too, but I don't think you can actually
have both (CSPRNG + fallback).

And then there is the certification issue.  We really want applications
that already use OpenSSL for other cryptography to use RAND_bytes
instead of arc4random.  Likewise for GNUTLS and gnutls_rnd.  What should
authors of those cryptographic libraries?  That's less clear, and really
depends on the constraints they operate in (e.g., they may target only a
subset of architectures and kernel versions).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 16:40       ` Florian Weimer
@ 2022-07-25 16:49         ` Adhemerval Zanella Netto
  2022-07-25 16:51         ` Jason A. Donenfeld
  2022-07-25 17:44         ` Rich Felker
  2 siblings, 0 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-25 16:49 UTC (permalink / raw)
  To: Florian Weimer, Rich Felker
  Cc: Jason A. Donenfeld, libc-alpha, Yann Droneaud, jann, Michael,
	Paul Eggert



On 25/07/22 13:40, Florian Weimer wrote:
> * Rich Felker:
> 
>> On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
>>> On 23/07/22 13:25, Jason A. Donenfeld wrote:
>>>> Firstly, for what use cases does this actually help? As of recent
>>>> changes to the Linux kernels -- now backported all the way to 4.9! --
>>>> getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>>>> states locklessly. Sure you avoid a syscall by doing that in userspace,
>>>> but does it really matter? Who exactly benefits from this?
>>>
>>> Mainly performance, since glibc both export getrandom and getentropy. 
>>> There were some discussion on maillist and we also decided to explicit
>>> state this is not a CSRNG on our documentation.
>>
>> This is an extreme documentation/specification bug that *hurts*
>> portability and security. The core contract of the historical
>> arc4random function is that it *is* a CSPRNG. Having a function by
>> that name that's allowed not to be one means now all software using it
>> has to add detection for the broken glibc variant.
>>
>> If the glibc implementation has flaws that actually make it not a
>> CSPRNG, this absolutely needs to be fixed. Not doing so is
>> irresponsible and will set everyone back a long ways.
> 
> The core issue is that on some kernels/architectures, reading from
> /dev/urandom can degrade to GRND_INSECURE (approximately), and while the
> result is likely still unpredictable, not everyone would label that as a
> CSPRNG.
> 
> If we document arc4random as a CSPRNG, this means that we would have to
> ditch the fallback code and abort the process if the getrandom system
> call is not available: when reading from /dev/urandom as a fallback, we
> have no way of knowing if we are in any of the impacted execution
> environments.  Based on your other comments, it seems that you are
> interested in such fallbacks, too, but I don't think you can actually
> have both (CSPRNG + fallback).

It seems the best course of actions, specially form the fact that document
arc4random as a CSPRNG seems to a deal-breaker. 

> 
> And then there is the certification issue.  We really want applications
> that already use OpenSSL for other cryptography to use RAND_bytes
> instead of arc4random.  Likewise for GNUTLS and gnutls_rnd.  What should
> authors of those cryptographic libraries?  That's less clear, and really
> depends on the constraints they operate in (e.g., they may target only a
> subset of architectures and kernel versions).
> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 16:40       ` Florian Weimer
  2022-07-25 16:49         ` Adhemerval Zanella Netto
@ 2022-07-25 16:51         ` Jason A. Donenfeld
  2022-07-25 17:44         ` Rich Felker
  2 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 16:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Rich Felker, Adhemerval Zanella Netto, libc-alpha, Yann Droneaud,
	jann, Michael, Paul Eggert, linux-crypto

Hi Florian,

On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer wrote:
> The core issue is that on some kernels/architectures, reading from
> /dev/urandom can degrade to GRND_INSECURE (approximately), and while the
> result is likely still unpredictable, not everyone would label that as a
> CSPRNG.

On some old kernels (though I think not all?), you can poll on
/dev/random. This isn't perfect, as the ancient "non blocking pool"
initialized after the "blocking pool", but it's not too imperfect
either. Take a look at the previously linked random-util.c.

> If we document arc4random as a CSPRNG, this means that we would have to
> ditch the fallback code and abort the process if the getrandom system
> call is not available: when reading from /dev/urandom as a fallback, we
> have no way of knowing if we are in any of the impacted execution
> environments.  Based on your other comments, it seems that you are
> interested in such fallbacks, too, but I don't think you can actually
> have both (CSPRNG + fallback).
> 
> And then there is the certification issue.  We really want applications
> that already use OpenSSL for other cryptography to use RAND_bytes
> instead of arc4random.  Likewise for GNUTLS and gnutls_rnd.  What should
> authors of those cryptographic libraries?  That's less clear, and really
> depends on the constraints they operate in (e.g., they may target only a
> subset of architectures and kernel versions).

I think all of this is yet another indication that there are some major
things to work out -- should we block or not? is buffering safe? is the
interface correct? -- and so we should just back out the arc4random
commit until this has been explored a bit more. We're not gaining
anything from rushing this, especially as a "source code compatibility"
thing, if there's not even agreement between OSes on what the function
does inside.

Jason

PS: please try to keep linux-crypto@vger.kernel.org CC'd. I've been
bouncing these manually when not, but it's hard to keep up with that.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 16:40       ` Florian Weimer
  2022-07-25 16:49         ` Adhemerval Zanella Netto
  2022-07-25 16:51         ` Jason A. Donenfeld
@ 2022-07-25 17:44         ` Rich Felker
  2022-07-25 18:33           ` Cristian Rodríguez
  2 siblings, 1 reply; 81+ messages in thread
From: Rich Felker @ 2022-07-25 17:44 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, jann

On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer via Libc-alpha wrote:
> * Rich Felker:
> 
> > On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> >> On 23/07/22 13:25, Jason A. Donenfeld wrote:
> >> > Firstly, for what use cases does this actually help? As of recent
> >> > changes to the Linux kernels -- now backported all the way to 4.9! --
> >> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> >> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> >> > but does it really matter? Who exactly benefits from this?
> >> 
> >> Mainly performance, since glibc both export getrandom and getentropy. 
> >> There were some discussion on maillist and we also decided to explicit
> >> state this is not a CSRNG on our documentation.
> >
> > This is an extreme documentation/specification bug that *hurts*
> > portability and security. The core contract of the historical
> > arc4random function is that it *is* a CSPRNG. Having a function by
> > that name that's allowed not to be one means now all software using it
> > has to add detection for the broken glibc variant.
> >
> > If the glibc implementation has flaws that actually make it not a
> > CSPRNG, this absolutely needs to be fixed. Not doing so is
> > irresponsible and will set everyone back a long ways.
> 
> The core issue is that on some kernels/architectures, reading from
> /dev/urandom can degrade to GRND_INSECURE (approximately), and while the
> result is likely still unpredictable, not everyone would label that as a
> CSPRNG.

Then don't fallback to /dev/urandom. It's not even a failsafe fallback
anyway (ENFILE, EMFILE, sandboxes, etc.) so it can't safely be used
here. Instead use SYS_sysctl and poll for entropy_avail, looping until
it's ready. AFAICT this works reliably on all kernels as far back as
glibc supports (assuming nothing idiotic like intentionally patching
or configuring out random support, but then it's PEBKAC error, as no
distros did this).

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 17:44         ` Rich Felker
@ 2022-07-25 18:33           ` Cristian Rodríguez
  2022-07-25 18:49             ` Rich Felker
  0 siblings, 1 reply; 81+ messages in thread
From: Cristian Rodríguez @ 2022-07-25 18:33 UTC (permalink / raw)
  To: Rich Felker
  Cc: Florian Weimer, Yann Droneaud, jann, Jason A. Donenfeld,
	libc-alpha, Michael

On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <dalias@libc.org> wrote:

> Then don't fallback to /dev/urandom.

Those are my thoughts as well.. but __libc_fatal() if there is no
usable getrandom syscall with the needed semantics, in short making
this interface usable only when the kernel is.

This is quite drastic, but probably the only sane way to go.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 18:33           ` Cristian Rodríguez
@ 2022-07-25 18:49             ` Rich Felker
  2022-07-27  1:54               ` Theodore Ts'o
  0 siblings, 1 reply; 81+ messages in thread
From: Rich Felker @ 2022-07-25 18:49 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	Michael, jann

On Mon, Jul 25, 2022 at 02:33:05PM -0400, Cristian Rodríguez via Libc-alpha wrote:
> On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <dalias@libc.org> wrote:
> 
> > Then don't fallback to /dev/urandom.
> 
> Those are my thoughts as well.. but __libc_fatal() if there is no
> usable getrandom syscall with the needed semantics, in short making
> this interface usable only when the kernel is.
> 
> This is quite drastic, but probably the only sane way to go.

You can at least try the sysctl and possibly also /dev approaches and
only treat this as fatal as a last resort. If you can inspect
entropy_avail or poll /dev/random to determine that the pool is
initialized this is very safe, I think. And some research on distro
practices might uncover whether this should be believed to be
complete.

(Note: I know some folks have raised seccomp sandboxing as an issue
too, but unlike kernel which is sometimes locked in by legacy
hardware, bad seccomp filters are in principle always fixable and are
a form of user/admin error since it's not valid to make assumptions
about what syscalls libc needs.)

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 18:49             ` Rich Felker
@ 2022-07-27  1:54               ` Theodore Ts'o
  2022-07-27  2:16                 ` Rich Felker
  2022-07-27 11:34                 ` Adhemerval Zanella Netto
  0 siblings, 2 replies; 81+ messages in thread
From: Theodore Ts'o @ 2022-07-27  1:54 UTC (permalink / raw)
  To: Rich Felker
  Cc: Cristian Rodríguez, Florian Weimer, Yann Droneaud,
	Jason A. Donenfeld, libc-alpha, Michael, jann, linux-crypto

On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote:
> 
> You can at least try the sysctl and possibly also /dev approaches and
> only treat this as fatal as a last resort. If you can inspect
> entropy_avail or poll /dev/random to determine that the pool is
> initialized this is very safe, I think. And some research on distro
> practices might uncover whether this should be believed to be
> complete.

I think people are *way* too worried about what happens if /dev/random
is symlinked to /dev/urandom, and/or other bits of insanitry.

The getrandom(3) system call has been around since v3.17.  That's
2014.  Even an ancient, obsolete enterprise distro like RHEL 7
backported the getrandom system call in 2017 --- a full 5 years ago.
If someone is still using a pre-2017, or $DEITY help them, pre-2014
kernel, that kernel will be so riddled with zero-day vulnerabilities
that some fallback to a /dev/urandom at boot time will be the
***least*** of their worries from a security perspective.  And that's
assuming someone who is so hide-bound as to be using a badly obsolete
kernel would be interested in going to a bleeding edge libc in the
first place!

Similarly the LTS kernels have gotten backports of Jason's latest
enhancements to the /dev/random driver.  Someone who is using an
out-of-date LTS kernel is similarly likely to be exposed to any number
of zero-day vulnerabilities.  Hence, the primary path that glibc
should be concerned about, IMHO, should assume that getrandom(2) is
(a) secure, and (b) fast.

The other thing to note here is this really is an over-constrained
problem.  Some people will insist, strongly, that they need
cryptographically secure random numbers, above all else.  Others will
insist that the interface for getting secure random numbers must never
block.  Still others will insist that they be able to use the
crappiest CPU's, on systems with absolutely no entropy that can be
harvested from I/O devices, and that they be able to generate mission-
or -life critical cryptgraphic keys milliseconds after the user
removes the consumer grade IOT device from the box, and plugs it into
wall for the first time.

It is ***impossible*** to satisfy all of these constraints.  We do the
best that we can in the kernel, but it's an order of magnitude harder
to do it in userspace.  So unless you want to cop-out by saying,
"arcrandom isn't really secure, so when 10% of all devices reachable
on the internet can breached, don't blame us", I strongly recommend
that you leave things to the kernel.

      		       	  	  - Ted
---
"Remember, the 'S' in IOT stands for security."

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27  1:54               ` Theodore Ts'o
@ 2022-07-27  2:16                 ` Rich Felker
  2022-07-27  2:45                   ` Theodore Ts'o
  2022-07-27 11:34                 ` Adhemerval Zanella Netto
  1 sibling, 1 reply; 81+ messages in thread
From: Rich Felker @ 2022-07-27  2:16 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	linux-crypto, Michael, jann

On Tue, Jul 26, 2022 at 09:54:30PM -0400, Theodore Ts'o via Libc-alpha wrote:
> On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote:
> > 
> > You can at least try the sysctl and possibly also /dev approaches and
> > only treat this as fatal as a last resort. If you can inspect
> > entropy_avail or poll /dev/random to determine that the pool is
> > initialized this is very safe, I think. And some research on distro
> > practices might uncover whether this should be believed to be
> > complete.
> 
> I think people are *way* too worried about what happens if /dev/random
> is symlinked to /dev/urandom, and/or other bits of insanitry.
> 
> The getrandom(3) system call has been around since v3.17.  That's
> 2014.

Last year I helped someone get musl up and running with EABI userspace
(all we support) on a pre-EABI kernel (2.6.18 or so?) on embedded
hardware in use in the field that could not be upgraded for hardware
support reasons. Assuming post-2014 kernel may be okay for
desktop/server distros but from my perspective it's pretty
unthinkable.

> Even an ancient, obsolete enterprise distro like RHEL 7
> backported the getrandom system call in 2017 --- a full 5 years ago.
> If someone is still using a pre-2017, or $DEITY help them, pre-2014
> kernel, that kernel will be so riddled with zero-day vulnerabilities
> that some fallback to a /dev/urandom at boot time will be the
> ***least*** of their worries from a security perspective.  And that's
> assuming someone who is so hide-bound as to be using a badly obsolete
> kernel would be interested in going to a bleeding edge libc in the
> first place!

There's a huge difference in zero-day vulnerabilities which might
exist nowhere but on a box that's not exposed to the outside world,
and possibly creating compromised key material from said boxes. And
weird embedded stuff that can't be upgraded is *also* the same setting
where you have a complete lack of early boot entropy.

I'm fine with folks who need this stuff coming to musl instead of
glibc, but I think folks on the glibc side are doing right to at least
*consider* whether/how it matters rather than writing anything older
than a few years off as irrelevant.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27  2:16                 ` Rich Felker
@ 2022-07-27  2:45                   ` Theodore Ts'o
  0 siblings, 0 replies; 81+ messages in thread
From: Theodore Ts'o @ 2022-07-27  2:45 UTC (permalink / raw)
  To: Rich Felker
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	linux-crypto, Michael, jann

On Tue, Jul 26, 2022 at 10:16:19PM -0400, Rich Felker wrote:
> Last year I helped someone get musl up and running with EABI userspace
> (all we support) on a pre-EABI kernel (2.6.18 or so?) on embedded
> hardware in use in the field that could not be upgraded for hardware
> support reasons. Assuming post-2014 kernel may be okay for
> desktop/server distros but from my perspective it's pretty
> unthinkable.

Was that machine on the network in any way?  Why did it need
cryptographic keys in the first place?  Sure, maybe there are some
super-rare cases where you just *happen* to decide that you need to
use ancient hardware to generate keys that are communicated over the
serial console to sign official RPM packages for some distro --- but I
would *hope* that the distro could afford to spring for hardware that
wasn't antedeluvian.

It's fair that there are stupid people out there who think that it's
an OK thing to use software which is riddled with zero-days because
they're too cheap to update their hardware.  But my point is that if
you are worrying about fallback to /dev/urandom being a security hole,
what *other* security holes might exist on that system?

I can understand the argument the machine shouldn't fail, which
probably means you probably want to make sure the ancient code
shouldn't block forever, even if they are generating RSA
public/private keypairs for SSL certificates in their init.d scripts,
milliseconds after being booted on CPU's so ancient that they don't
support RDRAND.  But let's be real here about how secure that system
is **actually** going to be, even *if* the random number generator is
perfect(tm) and bug-free(tm).  Let's not kid ourselves.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27  1:54               ` Theodore Ts'o
  2022-07-27  2:16                 ` Rich Felker
@ 2022-07-27 11:34                 ` Adhemerval Zanella Netto
  2022-07-27 12:32                   ` Theodore Ts'o
  2022-07-27 15:39                   ` Rich Felker
  1 sibling, 2 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-27 11:34 UTC (permalink / raw)
  To: Theodore Ts'o, Rich Felker
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	linux-crypto, Michael, jann



On 26/07/22 22:54, Theodore Ts'o via Libc-alpha wrote:
> On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote:
>>
>> You can at least try the sysctl and possibly also /dev approaches and
>> only treat this as fatal as a last resort. If you can inspect
>> entropy_avail or poll /dev/random to determine that the pool is
>> initialized this is very safe, I think. And some research on distro
>> practices might uncover whether this should be believed to be
>> complete.
> 
> I think people are *way* too worried about what happens if /dev/random
> is symlinked to /dev/urandom, and/or other bits of insanitry.

On glibc, my view is have settled to have the /dev/urandom fallback,
mainly to give ancient kernel that we still nominally support a way
to call arc4random without aborting the process (which seemed to be
a 'featured' frown upon when someone try to standardize posix_random
with Austin Group) and to give a fallback if the environment for whatever
reason filter getrandom.

But to be realistic newer glibc are usually deployed with newer kernels
and running on an environment without getrandom support will be highly
unlikely.  The only scenario that it might happen if someone tries to
run some container on older kernel (that one reason that prevented us
to raised minimum supported kernel for x86_64 some years ago), but it
will most likely have the same issues you described (unless the vendor
spent an herculean amount of time on backporting).

The only thing I am kinda worried is we will need to be judicious if
we aim to use arc4random internally for hardening, since on some pattern
usage and kernels we might hit some performance issues.  For instance,
we will need to tune down some internal parameters for a glibc testing
because now on a somewhat recent kernel (5.15.0-41-generic) I am seeing
a 10 runtime increase which the change to use getrandom.  Jason has
told it has been fixed upstream, but taking in consideration the box
is an updated Ubuntu 22.04, it might take some time to have this fix
propagated on all kernels out there.

> 
> The getrandom(3) system call has been around since v3.17.  That's
> 2014.  Even an ancient, obsolete enterprise distro like RHEL 7
> backported the getrandom system call in 2017 --- a full 5 years ago.
> If someone is still using a pre-2017, or $DEITY help them, pre-2014
> kernel, that kernel will be so riddled with zero-day vulnerabilities
> that some fallback to a /dev/urandom at boot time will be the
> ***least*** of their worries from a security perspective.  And that's
> assuming someone who is so hide-bound as to be using a badly obsolete
> kernel would be interested in going to a bleeding edge libc in the
> first place!
> 
> Similarly the LTS kernels have gotten backports of Jason's latest
> enhancements to the /dev/random driver.  Someone who is using an
> out-of-date LTS kernel is similarly likely to be exposed to any number
> of zero-day vulnerabilities.  Hence, the primary path that glibc
> should be concerned about, IMHO, should assume that getrandom(2) is
> (a) secure, and (b) fast.
> 
> The other thing to note here is this really is an over-constrained
> problem.  Some people will insist, strongly, that they need
> cryptographically secure random numbers, above all else.  Others will
> insist that the interface for getting secure random numbers must never
> block.  Still others will insist that they be able to use the
> crappiest CPU's, on systems with absolutely no entropy that can be
> harvested from I/O devices, and that they be able to generate mission-
> or -life critical cryptgraphic keys milliseconds after the user
> removes the consumer grade IOT device from the box, and plugs it into
> wall for the first time.
> 
> It is ***impossible*** to satisfy all of these constraints.  We do the
> best that we can in the kernel, but it's an order of magnitude harder
> to do it in userspace.  So unless you want to cop-out by saying,
> "arcrandom isn't really secure, so when 10% of all devices reachable
> on the internet can breached, don't blame us", I strongly recommend
> that you leave things to the kernel.
> 
>       		       	  	  - Ted
> ---
> "Remember, the 'S' in IOT stands for security."

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 11:34                 ` Adhemerval Zanella Netto
@ 2022-07-27 12:32                   ` Theodore Ts'o
  2022-07-27 12:49                     ` Florian Weimer
  2022-07-27 15:39                   ` Rich Felker
  1 sibling, 1 reply; 81+ messages in thread
From: Theodore Ts'o @ 2022-07-27 12:32 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Rich Felker, Florian Weimer, Yann Droneaud, Jason A. Donenfeld,
	libc-alpha, linux-crypto, Michael, jann

On Wed, Jul 27, 2022 at 08:34:17AM -0300, Adhemerval Zanella Netto wrote:
> The only thing I am kinda worried is we will need to be judicious if
> we aim to use arc4random internally for hardening, since on some pattern
> usage and kernels we might hit some performance issues.  For instance,
> we will need to tune down some internal parameters for a glibc testing
> because now on a somewhat recent kernel (5.15.0-41-generic) I am seeing
> a 10 runtime increase which the change to use getrandom.  Jason has
> told it has been fixed upstream, but taking in consideration the box
> is an updated Ubuntu 22.04, it might take some time to have this fix
> propagated on all kernels out there.

What I'd suggest is that we be a realistic about specific use cases.
Are we talking about scientific simulations?  You don't want to be
using be using secure random number generation for that anyway,
because most reputable scientists have this thing about repeatable
experiments.

Are we talking about key generation?  How many keys per second is it
really realistic that such a system would need to support?.  How many
SSL connections would it be *able* to support?  And since a secure web
server or VPN gateway is going to be on the network, then you're going
to want the latest kernel fixes, since there have been quite a some
Really Bad Security vulnerabilities that have been fixed just in the
past week (especially if you care about FEDRAMP or PCI compliance) at
which point, you'll get the new and improved getrandom(2).

But even if you didn't take the latest kernels, I think you will find
that if you actually benchmark how many queries per second a real-life
secure web server or VPN gateway, even the original 5.15.0 /dev/random
driver was plenty fast enough for real world cryptographic use cases.
Sure, maybe numbers would look small on a low-end ARM system --- but
how many secure web transactions or IPSEC/wireguard connections could
such a low-end ARM system really support, *anyway*?

One of the dirty little secrets of web sites who live and die by
clickbait performance benchmark articles for advertising revenue is
how rarely real life workloads really are bottlenecked by things like,
say, file system or /dev/random benchmarks.  Reading those articles
are *fun*, for people who like to say that their systems' metrics are
longer/faster/stronger/whatever, but it's rare that they actually
impact real world use cases.  More often than not, the bottleneck is
elsewhere.

Cheers,

					- Ted

P.S.  The newer /dev/random drier would probably help out people who do things
like "dd if=/dev/urandom of=/dev/expensive-ssd-where-it-would-be-way-faster-
and-less-destructive-of-write-wearout-to-use-hdparm-security-erase bs=4k" ---
but that's not really relevant to the glibc arc4random() discussion.  :-)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 12:32                   ` Theodore Ts'o
@ 2022-07-27 12:49                     ` Florian Weimer
  2022-07-27 20:15                       ` Theodore Ts'o
  0 siblings, 1 reply; 81+ messages in thread
From: Florian Weimer @ 2022-07-27 12:49 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Adhemerval Zanella Netto, Rich Felker, Yann Droneaud,
	Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann

* Theodore Ts'o:

> But even if you didn't take the latest kernels, I think you will find
> that if you actually benchmark how many queries per second a real-life
> secure web server or VPN gateway, even the original 5.15.0 /dev/random
> driver was plenty fast enough for real world cryptographic use cases.

The idea is to that arc4random() is suitable in pretty much all places
that have historically used random() (outside of deterministic
simulations).  Straight calls to getrandom are much, much slower than
random(), and it's not even the system call overhead.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 12:49                     ` Florian Weimer
@ 2022-07-27 20:15                       ` Theodore Ts'o
  2022-07-27 21:59                         ` Rich Felker
  2022-07-28  0:39                         ` Cristian Rodríguez
  0 siblings, 2 replies; 81+ messages in thread
From: Theodore Ts'o @ 2022-07-27 20:15 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Adhemerval Zanella Netto, Rich Felker, Yann Droneaud,
	Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann

On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> * Theodore Ts'o:
> 
> > But even if you didn't take the latest kernels, I think you will find
> > that if you actually benchmark how many queries per second a real-life
> > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > driver was plenty fast enough for real world cryptographic use cases.
> 
> The idea is to that arc4random() is suitable in pretty much all places
> that have historically used random() (outside of deterministic
> simulations).  Straight calls to getrandom are much, much slower than
> random(), and it's not even the system call overhead.

What are those places?  And what are their performance and security
requirements?  I've heard some people claim that arc4random() is
supposed to provide strong security guarantees.  I've heard others
claim that it doesn't, or at least glibc was planning on disclaiming
security guaranteees.  So there seems to be a lack of clarity about
the security requirements.

What about the performance requirements?  Designing an interface where
the requirement "as fast as possible" is often not a great pathway to
success, because the reality is that engineering is always about
tradeoffs.

If there are no security requirements (given the claim that some
people want to put in the documentation disclaiming that arc4random
might not be secure), why not just have people continue to use
random(3)?

    	      	     	      	     - Ted

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 20:15                       ` Theodore Ts'o
@ 2022-07-27 21:59                         ` Rich Felker
  2022-07-28  0:30                           ` Theodore Ts'o
  2022-07-28  0:39                         ` Cristian Rodríguez
  1 sibling, 1 reply; 81+ messages in thread
From: Rich Felker @ 2022-07-27 21:59 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	Michael, linux-crypto, jann

On Wed, Jul 27, 2022 at 04:15:24PM -0400, Theodore Ts'o via Libc-alpha wrote:
> On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> > * Theodore Ts'o:
> > 
> > > But even if you didn't take the latest kernels, I think you will find
> > > that if you actually benchmark how many queries per second a real-life
> > > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > > driver was plenty fast enough for real world cryptographic use cases.
> > 
> > The idea is to that arc4random() is suitable in pretty much all places
> > that have historically used random() (outside of deterministic
> > simulations).  Straight calls to getrandom are much, much slower than
> > random(), and it's not even the system call overhead.
> 
> What are those places?  And what are their performance and security
> requirements?  I've heard some people claim that arc4random() is
> supposed to provide strong security guarantees.  I've heard others
> claim that it doesn't, or at least glibc was planning on disclaiming
> security guaranteees.  So there seems to be a lack of clarity about
> the security requirements.

The only place I've heard of a viable "soft requirement" for real
entropy is for salting the hash function used in hash table maps to
harden them against DoS via intentional collisions. This is a small
but arguably legitimate usage domain. Most use of random() is not
this, and should not be this -- the value of deterministic execution
for ability to reproduce crashes, debug, etc. is real, and the value
of actual entropy vs a deterministic-seeded prng is imaginary.

The purpose of arc4random has always been *cryptographically secure*
entropy, not "gratuitously replace random() and break reproducible
behavior because the programmer does not understand the difference".
Nobody should be advocating for using these functions for anything
except secure secrets.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 21:59                         ` Rich Felker
@ 2022-07-28  0:30                           ` Theodore Ts'o
  0 siblings, 0 replies; 81+ messages in thread
From: Theodore Ts'o @ 2022-07-28  0:30 UTC (permalink / raw)
  To: Rich Felker
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha,
	Michael, linux-crypto, jann

On Wed, Jul 27, 2022 at 05:59:49PM -0400, Rich Felker wrote:
> The only place I've heard of a viable "soft requirement" for real
> entropy is for salting the hash function used in hash table maps to
> harden them against DoS via intentional collisions. This is a small
> but arguably legitimate usage domain.

OK, so this is an issue that both Perl and Python have had to deal
with, as described here: https://lwn.net/Articles/474912/

Is that fair description of the use case which you are describing?
Because if it is, in the worst case, we only need a single random
value for every http request made to the server.  Would you agree with
that?

I think you'll find that even the original getrandom(2) system call or
fetching a random value from /dev/urandom was plenty fast enough for
this particular use case.  If you're on some slow, ancient CPU, the
webserver isn't going to be able to handle that many queries per
second.  And if you're on a fast CPU, the original /dev/urandom and/or
getrandom(2) system call would be plenty fast enough.

This is why both Jason and I have been trying to push people to
clearly articular a specific use case and the attendant performance
requirement, so we can test the hypothesis regarding how critical it
is to have an userspace cryptographically secure RNG, with all of the
attendant opportunities for security vulnerabilities in the face of VM
snapshots, or VM's getting duplicated with a pre-spun execution image,
etc., etc.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 20:15                       ` Theodore Ts'o
  2022-07-27 21:59                         ` Rich Felker
@ 2022-07-28  0:39                         ` Cristian Rodríguez
  1 sibling, 0 replies; 81+ messages in thread
From: Cristian Rodríguez @ 2022-07-28  0:39 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, Rich Felker,
	libc-alpha, Michael, linux-crypto, jann

On Wed, Jul 27, 2022 at 4:15 PM Theodore Ts'o via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote:
> > * Theodore Ts'o:
> >
> > > But even if you didn't take the latest kernels, I think you will find
> > > that if you actually benchmark how many queries per second a real-life
> > > secure web server or VPN gateway, even the original 5.15.0 /dev/random
> > > driver was plenty fast enough for real world cryptographic use cases.
> >
> > The idea is to that arc4random() is suitable in pretty much all places
> > that have historically used random() (outside of deterministic
> > simulations).  Straight calls to getrandom are much, much slower than
> > random(), and it's not even the system call overhead.
>
> What are those places?

Well pretty much everywhere a shared library is involved from the start..
On one very basic vm here there are 18 shared libraries using srandom,
thus perturbing each other states if loaded by the same process,
possibly in a catastrophic/predictable way.
and nobody uses the random_r interfaces.

> And what are their performance and security
> requirements?

Common programmers know nothing about this, even seasoned ones don't..
if it runs slow or is not CSPRNG then the average app will
use one userspace PRNG or CSPRNG  or buffer from the kernel somewhere..
I do not have to justify this assertion..it is just a matter you
download libgcrypt, gnutls, openssl none of those libraries use the
kernel entropy
as the first option, all feed them to either proven or dubious s RNGs
schemes and then pass that to users.
Think on why that is and why we are discussing yet another interface
in the first place..

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-27 11:34                 ` Adhemerval Zanella Netto
  2022-07-27 12:32                   ` Theodore Ts'o
@ 2022-07-27 15:39                   ` Rich Felker
  1 sibling, 0 replies; 81+ messages in thread
From: Rich Felker @ 2022-07-27 15:39 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Theodore Ts'o, Florian Weimer, Yann Droneaud,
	Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann

On Wed, Jul 27, 2022 at 08:34:17AM -0300, Adhemerval Zanella Netto via Libc-alpha wrote:
> 
> 
> On 26/07/22 22:54, Theodore Ts'o via Libc-alpha wrote:
> > On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote:
> >>
> >> You can at least try the sysctl and possibly also /dev approaches and
> >> only treat this as fatal as a last resort. If you can inspect
> >> entropy_avail or poll /dev/random to determine that the pool is
> >> initialized this is very safe, I think. And some research on distro
> >> practices might uncover whether this should be believed to be
> >> complete.
> > 
> > I think people are *way* too worried about what happens if /dev/random
> > is symlinked to /dev/urandom, and/or other bits of insanitry.
> 
> On glibc, my view is have settled to have the /dev/urandom fallback,
> mainly to give ancient kernel that we still nominally support a way
> to call arc4random without aborting the process (which seemed to be
> a 'featured' frown upon when someone try to standardize posix_random
> with Austin Group) and to give a fallback if the environment for whatever
> reason filter getrandom.
> 
> But to be realistic newer glibc are usually deployed with newer kernels
> and running on an environment without getrandom support will be highly
> unlikely.  The only scenario that it might happen if someone tries to
> run some container on older kernel (that one reason that prevented us
> to raised minimum supported kernel for x86_64 some years ago), but it
> will most likely have the same issues you described (unless the vendor
> spent an herculean amount of time on backporting).
> 
> The only thing I am kinda worried is we will need to be judicious if
> we aim to use arc4random internally for hardening, since on some pattern

If failure to support the functionality nukes the process, it's not
suitable for internal hardening. Also if it hangs forever in early
boot it's not suitable for internal hardening. AT_RANDOM is the
functionality for internal hardening, which glibc already uses and
should continue to use, extending it with chacha if more bytes are
needed.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
  2022-07-23 17:18   ` Paul Eggert
  2022-07-23 17:39   ` Adhemerval Zanella Netto
@ 2022-07-23 19:04   ` Cristian Rodríguez
  2022-07-23 22:59     ` Jason A. Donenfeld
  2022-07-25 10:14     ` Florian Weimer
  2022-07-25 10:11   ` Florian Weimer
  2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
  4 siblings, 2 replies; 81+ messages in thread
From: Cristian Rodríguez @ 2022-07-23 19:04 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Yann Droneaud, jann, Michael

On Sat, Jul 23, 2022 at 12:25 PM Jason A. Donenfeld via Libc-alpha
<libc-alpha@sourceware.org> wrote:

> For that reason, past discussion of having some random number generation
> in userspace libcs has geared toward doing this in the vDSO, somehow,
> where the kernel can be part and parcel of that effort.

On linux just making this interface call "something" from the VDSO that

- does not block.
- cannot ever fail or if it does indeed need to bail out it kills the
calling thread as last resort.

(if neither of those can be provided, we are back to square one)

Will be beyond awesome because it could be usable everywhere,
including the dynamic linker, malloc or whatever else
question is..is there any at least experimental patch  with a hope of
beign accepted available ?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 19:04   ` Cristian Rodríguez
@ 2022-07-23 22:59     ` Jason A. Donenfeld
  2022-07-24 16:23       ` Cristian Rodríguez
  2022-07-25 10:14     ` Florian Weimer
  1 sibling, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-23 22:59 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Yann Droneaud, jann, Michael, linux-crypto

Hi Cristian,

On Sat, Jul 23, 2022 at 03:04:36PM -0400, Cristian Rodríguez wrote:
> On linux just making this interface call "something" from the VDSO that
> 
> - does not block.
> - cannot ever fail or if it does indeed need to bail out it kills the
> calling thread as last resort.
> 
> (if neither of those can be provided, we are back to square one)
> 
> Will be beyond awesome because it could be usable everywhere,
> including the dynamic linker, malloc or whatever else
> question is..is there any at least experimental patch  with a hope of
> beign accepted available ?

Doesn't getrandom() already basically have this quality? If you call
getrandom(0), it'll block until the RNG is initialized once (which now
happens pretty reliably early on in boot). If you call getrandom(GRND_
INSECURE), it will skip that blocking. Both mechanisms are reliable and
available on all current kernel.org stable kernels.

Is there something about these you don't like and think need fixing? I'm
open to suggestions on how to further improve that interface if it has a
notable shortcoming.

If somebody has a compelling performance case that's widespread and
can't be fixed in the kernel alone, I wouldn't be adverse to vDSOing it.
But such an undertaking would probably be contingent on doing this with
the glibc developers, rather than trying to retroactively bandaid an
addition that shipped broken with a documentation cop-out.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 22:59     ` Jason A. Donenfeld
@ 2022-07-24 16:23       ` Cristian Rodríguez
  2022-07-24 21:57         ` Jason A. Donenfeld
  0 siblings, 1 reply; 81+ messages in thread
From: Cristian Rodríguez @ 2022-07-24 16:23 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Yann Droneaud, jann, Michael, linux-crypto

On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Doesn't getrandom() already basically have this quality?

In current kernels. yes. problems with old kernels remain..The syscall
overhead being too high for some use cases is still a remaining
problem,
if that was overcomed it could be used literally for everything,
including simulations and other stuff.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-24 16:23       ` Cristian Rodríguez
@ 2022-07-24 21:57         ` Jason A. Donenfeld
  0 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-24 21:57 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Yann Droneaud, jann, Michael, linux-crypto

Hi Cristian,

On Sun, Jul 24, 2022 at 12:23:43PM -0400, Cristian Rodríguez wrote:
> On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> > Doesn't getrandom() already basically have this quality?
> 
> In current kernels. yes. problems with old kernels remain..

Can you outline specifically which kernels you think those are and what
the problems you think there are? And how arc4random as currently
implemented does away with those problems?

I kind of suspect you don't have something specific in mind...

> The syscall
> overhead being too high for some use cases is still a remaining
> problem,

Really? Do you have any numbers? I would be very surprised to hear that
this is affecting things that intend to use arc4random as a substitute.
Could you give me specifics on this? Again, this sounds made up in the
absence of something real, widespread, and particular.

> if that was overcomed it could be used literally for everything,
> including simulations and other stuff.

You mentioned simulations, but actually simulations are one thing where
you want repeatable randomness -- something insecure with a seed that
gives a good distribution and is extremely fast, so that you can repeat
your simulation with the same data need-be. For this there are various
LFSRs and such that work fine and are well explored. But that's not what
getrandom() is, nor arc4random().

More generally speaking, there are well-defined RNGs that are for
simulations and take seeds, and there are well-defined RNGs that are
sufficient for crypto, and then there's a massive valley of ill-defined
junk in between that people keep shooting themselves in the foot with.

The fact that you won't even call arc4random cryptographically secure
(according to Adhemerval's comment) indicates to me that something has
gone wrong here.

So, please, I urge you to put the breaks on this a little bit. Come up
with numbers. Let's lay out the interfaces and properties we want. And
then we'll see what we can draw up together.

But now I'm just repeating myself. See my earlier reply here:
https://lore.kernel.org/linux-crypto/Ytx8GKSZfRt+ZrEO@zx2c4.com/

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 19:04   ` Cristian Rodríguez
  2022-07-23 22:59     ` Jason A. Donenfeld
@ 2022-07-25 10:14     ` Florian Weimer
  1 sibling, 0 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-25 10:14 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: Jason A. Donenfeld, libc-alpha, Adhemerval Zanella Netto,
	Yann Droneaud, jann, Michael

* Cristian Rodríguez:

> On Sat, Jul 23, 2022 at 12:25 PM Jason A. Donenfeld via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>
>> For that reason, past discussion of having some random number generation
>> in userspace libcs has geared toward doing this in the vDSO, somehow,
>> where the kernel can be part and parcel of that effort.
>
> On linux just making this interface call "something" from the VDSO that
>
> - does not block.
> - cannot ever fail or if it does indeed need to bail out it kills the
> calling thread as last resort.
>
> (if neither of those can be provided, we are back to square one)
>
> Will be beyond awesome because it could be usable everywhere,
> including the dynamic linker, malloc or whatever else
> question is..is there any at least experimental patch  with a hope of
> beign accepted available ?

I agree that this would be nice, but we'd like have to donate
thread-specific data for kernel use, and that's currently totally
vaporware.

The “cannot ever fail” part is impossible to achieve due to old kernels
and seccomp filters.  Low-level userspace needs to paper over it in some
way, so that applications don't have to deal with it.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
                     ` (2 preceding siblings ...)
  2022-07-23 19:04   ` Cristian Rodríguez
@ 2022-07-25 10:11   ` Florian Weimer
  2022-07-25 11:04     ` Jason A. Donenfeld
  2022-07-25 14:56     ` Rich Felker
  2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
  4 siblings, 2 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-25 10:11 UTC (permalink / raw)
  To: Jason A. Donenfeld via Libc-alpha
  Cc: Adhemerval Zanella Netto, Yann Droneaud, jann, Michael,
	Jason A. Donenfeld

* Jason A. Donenfeld via Libc-alpha:

> I really wonder whether this is a good idea, whether this is something
> that glibc wants, and whether it's a design worth committing to in the
> long term.

Do you object to the interface, or the implementation?

The implementation can be improved easily enough at a later date.

> Firstly, for what use cases does this actually help? As of recent
> changes to the Linux kernels -- now backported all the way to 4.9! --
> getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> states locklessly. Sure you avoid a syscall by doing that in userspace,
> but does it really matter? Who exactly benefits from this?

getrandom may be fast for bulk generation.  It's not that great for
generating a few bits here and there.  For example, shuffling a
1,000-element array takes 18 microseconds with arc4random_uniform in
glibc, and 255 microseconds with the naïve getrandom-based
implementation (with slightly biased results; measured on an Intel
i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).

> You miss out on this with arc4random, and if that information _is_ to be
> exported to userspace somehow in the future, it would be awfully nice to
> design the userspace interface alongside the kernel one.

What is the kernel interface you are talking about?  From an interface
standpoint, arc4random_buf and getrandom are very similar, with the main
difference is that arc4random_buf cannot report failure (except by
terminating the process).

> Seen from this perspective, going with OpenBSD's older paradigm might be
> rather limiting. Why not work together, between the kernel and libc, to
> see if we can come up with something better, before settling on an
> interface with semantics that are hard to walk back later?

Historically, kernel developers were not interested in solving some of
the hard problems (especially early seeding) that prevent the use of
getrandom during early userspace stages.

> As-is, it's hard to recommend that anybody really use these functions.
> Just keep using getrandom(2), which has mostly favorable semantics.

Some applications still need to run in configurations where getrandom is
not available (either because the kernel is too old, or because it has
been disabled via seccomp).

> Yes, I get it: it's fun to make a random number generator, and so lots
> of projects figure out some way to make yet another one somewhere
> somehow. But the tendency to do so feels like a weird computer tinkerer
> disease rather something that has ever helped the overall ecosystem.

The performance numbers suggest that we benefit from buffering in user
space.  It might not be necessary to implement expansion in userspace.
getrandom (or /dev/urandom) with a moderately-sized buffer could be
sufficient.

But that's an implementation detail, and something we can revisit later.
If we vDSO acceleration for getrandom (maybe using the userspace
thread-specific data donation we discussed for rseq), we might
eventually do way with the buffering in glibc.  Again this is an
implementation detail we can change easily enough.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 10:11   ` Florian Weimer
@ 2022-07-25 11:04     ` Jason A. Donenfeld
  2022-07-25 12:39       ` Florian Weimer
  2022-07-25 13:25       ` Jeffrey Walton
  2022-07-25 14:56     ` Rich Felker
  1 sibling, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 11:04 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jason A. Donenfeld via Libc-alpha, Adhemerval Zanella Netto,
	Yann Droneaud, jann, Michael, linux-crypto

Hi Florian,

On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote:
> > I really wonder whether this is a good idea, whether this is something
> > that glibc wants, and whether it's a design worth committing to in the
> > long term.
> 
> Do you object to the interface, or the implementation?
> 
> The implementation can be improved easily enough at a later date.

Sort of both, as I don't think it's wise to commit to the former without
a good idea of the full ideal space of the latter, and very clearly from
reading that discussion, that hasn't been explored.

In particular, Adhemerval has said you won't be committing to making
arc4random suitable for crypto, going so far as to mention it's not a
CSPRNG in the documentation. As I described in my reply to him (please
read that), the "documentation cop-out" will lead to tears inevitably.
Not only is that dangerous and bad to do alone, but it severely muddies
the waters with what other operating systems suggest about its permitted
use cases.

Here's that email for reference:
https://lore.kernel.org/linux-crypto/Ytx8GKSZfRt+ZrEO@zx2c4.com/

If you're going to ship an interface that people *will* use for
sensitive things -- especially considering Paul's comment about the intent
being "source code compatibility" -- then you must not ship it knowingly
broken by design. There's no amount of documentation papering that makes
this okay. Until you know how to implement it well, don't ship the
interface. And maybe in the process of trying to implement it well,
you'll find something suboptimal about the interface that can be
fixed.

> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
> 
> getrandom may be fast for bulk generation.  It's not that great for
> generating a few bits here and there.  For example, shuffling a
> 1,000-element array takes 18 microseconds with arc4random_uniform in
> glibc, and 255 microseconds with the naïve getrandom-based
> implementation (with slightly biased results; measured on an Intel
> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).

So maybe we should look into vDSO'ing getrandom(), if this is a problem
for real use cases, and you find that these sorts of things are
widespread in real code?

> > You miss out on this with arc4random, and if that information _is_ to be
> > exported to userspace somehow in the future, it would be awfully nice to
> > design the userspace interface alongside the kernel one.
> 
> What is the kernel interface you are talking about?  From an interface
> standpoint, arc4random_buf and getrandom are very similar, with the main
> difference is that arc4random_buf cannot report failure (except by
> terminating the process).

Referring to information above about reseeding. So in this case it would
be some form of a generation counter most likely. There's also been some
discussion about exporting some aspect of the vmgenid counter to
userspace.

> > Seen from this perspective, going with OpenBSD's older paradigm might be
> > rather limiting. Why not work together, between the kernel and libc, to
> > see if we can come up with something better, before settling on an
> > interface with semantics that are hard to walk back later?
> 
> Historically, kernel developers were not interested in solving some of
> the hard problems (especially early seeding) that prevent the use of
> getrandom during early userspace stages.

I really don't know what you're talking about here. I understood you up
until the opening parenthesis, and initially thought to reply, "but I am
interested! let's work together" or something, but then you mentioned
getrandom()'s issues with early userspace, and I became confused. If you
use getrandom(GRND_INSECURE), it won't block and you'll get bytes even
before the rng has seeded. If you use getrandom(0), the kernel's RNG
will use jitter to seed itself ASAP so it doesn't block forever (on
platforms where that's possible, anyhow). Both of these qualities mostly
predate my heavy involvement. So your statement confuses me. But with
that said, if you do find some lack of interest on something you think
is important, please give me a try, and maybe you'll have better luck. I
very much am interested in solving longstanding problems in this domain.

> > As-is, it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
> 
> Some applications still need to run in configurations where getrandom is
> not available (either because the kernel is too old, or because it has
> been disabled via seccomp).

I don't quite understand this. People without getrandom() typically
fallback to using /dev/urandom. "But what if FD in derp derp mountns
derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom()
came about. But doesn't arc4random() fallback to using /dev/urandom in
this exact same way? I don't see how arc4random() really changes the
equation here, except that maybe I should amend my statement to say,
"Just keep using getrandom(2) or /dev/urandom, which has mostly
favorable semantics." (After all, I didn't see any wild-n-crazy fallback
to AT_RANDOM like what systemd does with random-util.c:
https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )

Seen in that sense, as I wrote to Paul, if you're after arc4random for
source code compatibility -- or because you simply like its non-failing
interface and want to commit to that no matter the costs whatsoever --
then you could start by making that a light shim around getrandom()
(falling back to /dev/urandom, I guess), and then we can look into ways
of accelerating getrandom() for new kernels. This way you don't ship
something broken out of the gate, and there's still room for
improvement. Though I would still note that committing to the interface
early like this comes with some concern.

> The performance numbers suggest that we benefit from buffering in user
> space.

The question is whether it's safe and advisable to buffer this way in
userspace. Does userspace have the right information now of when to
discard the buffer and get a new one? I suspect it does not.

> But that's an implementation detail, and something we can revisit later.

No, these are not mere implementation details. When Adhemerval is
talking about warning people in the documentation that this shouldn't be
used for crypto, that should be a wake up call that something is really
off here. Don't ship things you know are broken, and then call that an
"implementation detail" that can be hedged with "documentation".

If a new function, extra_deluxe_memset(), occasionally wrote a 0x41
somewhere unexpected, you'd laugh if somebody called that a mere
implementation detail and suggested you just slap a warning in the
documentation and call it a day.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 11:04     ` Jason A. Donenfeld
@ 2022-07-25 12:39       ` Florian Weimer
  2022-07-25 13:43         ` Jason A. Donenfeld
                           ` (2 more replies)
  2022-07-25 13:25       ` Jeffrey Walton
  1 sibling, 3 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-25 12:39 UTC (permalink / raw)
  To: Jason A. Donenfeld via Libc-alpha
  Cc: Jason A. Donenfeld, Yann Droneaud, Michael, linux-crypto, jann

* Jason A. Donenfeld via Libc-alpha:

> Hi Florian,
>
> On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote:
>> > I really wonder whether this is a good idea, whether this is something
>> > that glibc wants, and whether it's a design worth committing to in the
>> > long term.
>> 
>> Do you object to the interface, or the implementation?
>> 
>> The implementation can be improved easily enough at a later date.
>
> Sort of both, as I don't think it's wise to commit to the former without
> a good idea of the full ideal space of the latter, and very clearly from
> reading that discussion, that hasn't been explored.

But we are only concerned with the application interface.  Do we really
expect that to be different from arc4random_buf and its variants?

The interface between glibc and the kernel can be changed without
impacting applications.

> In particular, Adhemerval has said you won't be committing to making
> arc4random suitable for crypto, going so far as to mention it's not a
> CSPRNG in the documentation.

Below you suggest to use GRND_INSECURE to avoid deadlocks during
booting.  It's documented in the UAPI header as “Return
non-cryptographic random bytes”.  I assume it's broadly equivalent to
reading from /dev/urandom (which we need to support for backwards
compatibility, and currently use to avoid blocking).  This means that we
cannot really document the resulting bits as cryptographically strong
from an application perspective because the kernel is not willing to
make this commitment.

>> > Firstly, for what use cases does this actually help? As of recent
>> > changes to the Linux kernels -- now backported all the way to 4.9! --
>> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>> > states locklessly. Sure you avoid a syscall by doing that in userspace,
>> > but does it really matter? Who exactly benefits from this?
>> 
>> getrandom may be fast for bulk generation.  It's not that great for
>> generating a few bits here and there.  For example, shuffling a
>> 1,000-element array takes 18 microseconds with arc4random_uniform in
>> glibc, and 255 microseconds with the naïve getrandom-based
>> implementation (with slightly biased results; measured on an Intel
>> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).
>
> So maybe we should look into vDSO'ing getrandom(), if this is a problem
> for real use cases, and you find that these sorts of things are
> widespread in real code?

We can investigate that, but it doesn't change the application
interface.

>> > You miss out on this with arc4random, and if that information _is_ to be
>> > exported to userspace somehow in the future, it would be awfully nice to
>> > design the userspace interface alongside the kernel one.
>> 
>> What is the kernel interface you are talking about?  From an interface
>> standpoint, arc4random_buf and getrandom are very similar, with the main
>> difference is that arc4random_buf cannot report failure (except by
>> terminating the process).
>
> Referring to information above about reseeding. So in this case it would
> be some form of a generation counter most likely. There's also been some
> discussion about exporting some aspect of the vmgenid counter to
> userspace.

We don't need any of that in userspace if the staging buffer is managed
by the kernel, which is why the thread-specific data donation is so
attractive as an approach.  The kernel knows where all these buffers are
located and can invalidate them as needed.

>> > Seen from this perspective, going with OpenBSD's older paradigm might be
>> > rather limiting. Why not work together, between the kernel and libc, to
>> > see if we can come up with something better, before settling on an
>> > interface with semantics that are hard to walk back later?
>> 
>> Historically, kernel developers were not interested in solving some of
>> the hard problems (especially early seeding) that prevent the use of
>> getrandom during early userspace stages.
>
> I really don't know what you're talking about here. I understood you up
> until the opening parenthesis, and initially thought to reply, "but I am
> interested! let's work together" or something, but then you mentioned
> getrandom()'s issues with early userspace, and I became confused. If you
> use getrandom(GRND_INSECURE), it won't block and you'll get bytes even
> before the rng has seeded. If you use getrandom(0), the kernel's RNG
> will use jitter to seed itself ASAP so it doesn't block forever (on
> platforms where that's possible, anyhow). Both of these qualities mostly
> predate my heavy involvement. So your statement confuses me. But with
> that said, if you do find some lack of interest on something you think
> is important, please give me a try, and maybe you'll have better luck. I
> very much am interested in solving longstanding problems in this domain.

I tried to de-escalate here, and clearly that didn't work.  The context
here is that historically, working with the “random” kernel maintainers
has been very difficult for many groups of people.  Many of us are tired
of those non-productive discussions.  I forgot that this has recently
changed on the kernel side.  I understand that it's taking years to
overcome these perceptions.  glibc is still struggling with this, too.

Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
as I wrote above, it's UAPI documentation is a bit scary.  Maybe it
would be possible to clarify this in the manual pages a bit?  I *assume*
that if we are willing to read from /dev/urandom, we can use
GRND_INSECURE right away to avoid that fallback path on sufficiently new
kernels.  But it would be nice to have confirmation.

>> > As-is, it's hard to recommend that anybody really use these functions.
>> > Just keep using getrandom(2), which has mostly favorable semantics.
>> 
>> Some applications still need to run in configurations where getrandom is
>> not available (either because the kernel is too old, or because it has
>> been disabled via seccomp).
>
> I don't quite understand this. People without getrandom() typically
> fallback to using /dev/urandom. "But what if FD in derp derp mountns
> derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom()
> came about. But doesn't arc4random() fallback to using /dev/urandom in
> this exact same way? I don't see how arc4random() really changes the
> equation here, except that maybe I should amend my statement to say,
> "Just keep using getrandom(2) or /dev/urandom, which has mostly
> favorable semantics." (After all, I didn't see any wild-n-crazy fallback
> to AT_RANDOM like what systemd does with random-util.c:
> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )

I had some patches with AT_RANDOM fallback, including overwriting
AT_RANDOM with output from the seeded PRNG.  It's certainly messy.  I
probably didn't bother to post these patches given how bizarre the whole
thing was.  I did have fallback to CPU instructions, but that turned out
to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
unclear).

> Seen in that sense, as I wrote to Paul, if you're after arc4random for
> source code compatibility -- or because you simply like its non-failing
> interface and want to commit to that no matter the costs whatsoever --
> then you could start by making that a light shim around getrandom()
> (falling back to /dev/urandom, I guess), and then we can look into ways
> of accelerating getrandom() for new kernels. This way you don't ship
> something broken out of the gate, and there's still room for
> improvement. Though I would still note that committing to the interface
> early like this comes with some concern.

The ChaCha20 generator we currently have in the tree may not be
required, true.  But this doesn't make what we have today “broken”, it's
merely overly complicated.  And replacing that with a straight buffer
from getrandom does not change the external interface, so we can do this
any time we want.

>> The performance numbers suggest that we benefit from buffering in user
>> space.
>
> The question is whether it's safe and advisable to buffer this way in
> userspace. Does userspace have the right information now of when to
> discard the buffer and get a new one? I suspect it does not.

Not completely, no, but we can cover many cases.  I do not currently see
a way around that if we want to promote arc4random_uniform(limit) as a
replacement for random() % limit.

>> But that's an implementation detail, and something we can revisit later.
>
> No, these are not mere implementation details. When Adhemerval is
> talking about warning people in the documentation that this shouldn't be
> used for crypto, that should be a wake up call that something is really
> off here. Don't ship things you know are broken, and then call that an
> "implementation detail" that can be hedged with "documentation".

Again, given the issues around GRND_INSECURE (the reason why it exists),
we do not have much choice on the glibc side.  And these issues will be
there for the foreseeable future, whether glibc provides arc4random or
not.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 12:39       ` Florian Weimer
@ 2022-07-25 13:43         ` Jason A. Donenfeld
  2022-07-25 13:58           ` Cristian Rodríguez
  2022-07-25 16:06           ` Rich Felker
  2022-07-26 14:27         ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud
  2022-07-26 14:35         ` arc4random - are you sure we want these? Yann Droneaud
  2 siblings, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 13:43 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jason A. Donenfeld via Libc-alpha, Yann Droneaud, Michael,
	linux-crypto, jann

Hi Florian,

On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote:
> Below you suggest to use GRND_INSECURE to avoid deadlocks during
> booting.  It's documented in the UAPI header as “Return
> non-cryptographic random bytes”.  I assume it's broadly equivalent to
> reading from /dev/urandom (which we need to support for backwards
> compatibility, and currently use to avoid blocking).  This means that we
> cannot really document the resulting bits as cryptographically strong
> from an application perspective because the kernel is not willing to
> make this commitment.
> Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
> as I wrote above, it's UAPI documentation is a bit scary.  Maybe it
> would be possible to clarify this in the manual pages a bit?  I *assume*
> that if we are willing to read from /dev/urandom, we can use
> GRND_INSECURE right away to avoid that fallback path on sufficiently new
> kernels.  But it would be nice to have confirmation.

getrandom(GRND_INSECURE) is the same as getrandom(0), except before the
RNG is seeded, in which case the former will return ~garbage randomness
while the latter will block. The only current difference between
getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a
second to do the jitter entropy thing if the RNG isn't seeded yet.

I agree that the documentation around this is really bad. Actually, so
much of the documentation is out of date or confusing. Thanks for the
kick on this: I really do need to rewrite that / clean it up.

So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will
return the same "quality" randomness as getrandom(0), except before
the RNG is initialized. I'll fix up the docs for that, but feel free to
refer to this statement ahead of that if you need.

Code-wise, the only relevant branch related to GRND_INSECURE is:

	if (!crng_ready() && !(flags & GRND_INSECURE)) {
		if (flags & GRND_NONBLOCK)
			return -EAGAIN;
		ret = wait_for_random_bytes();
		if (unlikely(ret))
			return ret;
	}

That means: if it's not ready, and you didn't pass _INSECURE, and you
didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out
if that's interrupted by a signal. Other than that one block, it
continues on to do the same thing as getrandom(0).

With that said, however, I think it'd be nice if you used only blocking
randomness, and shove the initialization problem at init systems and
bootloaders and such. In 5.20, for example, there'll be an x86 boot
protocol for GRUB and kexec and hypervisors and such to pass a seed, and
since a long time, there exists a device tree attribute for the same.
Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for
the ecosystem at large. And I'm in general interest in seeing progress
on decades long initialization-time seeding concerns.

> > Sort of both, as I don't think it's wise to commit to the former without
> > a good idea of the full ideal space of the latter, and very clearly from
> > reading that discussion, that hasn't been explored.
> 
> But we are only concerned with the application interface.  Do we really
> expect that to be different from arc4random_buf and its variants?
> 
> The interface between glibc and the kernel can be changed without
> impacting applications.

I feel like you missed the whole thrust of my argument, in which I
caution against shipping something that's known-broken, particularly
when it pertains to something sensitive like generating secret keys.

Regarding the application interface: it's still unclear what's best
until we start trying to see what the implementation would look like.
Just to pick something floating around in my head now since reading your
last email: there seems to be some question about whether arc4random
should block or not. If it's used for crypto, it probably should. But
maybe you want an interface that doesn't. Perhaps that discussion leads
naturally to exposing a flag. Or not! And then there are related
questions about what the return value should be, if any. The point is
that the devil is often in the details with these things, and I worry
about putting the cart before the horse here.

> >> > You miss out on this with arc4random, and if that information _is_ to be
> >> > exported to userspace somehow in the future, it would be awfully nice to
> >> > design the userspace interface alongside the kernel one.
> >> 
> >> What is the kernel interface you are talking about?  From an interface
> >> standpoint, arc4random_buf and getrandom are very similar, with the main
> >> difference is that arc4random_buf cannot report failure (except by
> >> terminating the process).
> >
> > Referring to information above about reseeding. So in this case it would
> > be some form of a generation counter most likely. There's also been some
> > discussion about exporting some aspect of the vmgenid counter to
> > userspace.
> 
> We don't need any of that in userspace if the staging buffer is managed
> by the kernel, which is why the thread-specific data donation is so
> attractive as an approach.  The kernel knows where all these buffers are
> located and can invalidate them as needed.

There still might be a need for userspace to have that information, for
network protocol implementations that need to drop their ephemeral keys
on a virtual machine fork, for example. But that's kind of a different
discussion. For the purposes of a vDSO'd getrandom(), I agree that the
kernel managing a buffer that's just an opaque blob to userspace is
probably the best option.

> I tried to de-escalate here, and clearly that didn't work.  The context
> here is that historically, working with the “random” kernel maintainers
> has been very difficult for many groups of people.  Many of us are tired
> of those non-productive discussions.  I forgot that this has recently
> changed on the kernel side.  I understand that it's taking years to
> overcome these perceptions.  glibc is still struggling with this, too.

Oh, I see what you're getting at. Yea, sure, things are potentially
different now. I'm eager to work on this, so if you're finding things
that are lacking, I'm all ears for fixing them.

> I had some patches with AT_RANDOM fallback, including overwriting
> AT_RANDOM with output from the seeded PRNG.  It's certainly messy.  I
> probably didn't bother to post these patches given how bizarre the whole
> thing was.  I did have fallback to CPU instructions, but that turned out
> to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
> unclear).

Yea, it's kind of tricky as other things might be using AT_RANDOM also
and then you have a whole race issue and domain separation and whatnot.
The thing in systemd isn't really good for crypto -- no forward secrecy
and such -- but it's ostensibly better than random().

> The ChaCha20 generator we currently have in the tree may not be
> required, true.  But this doesn't make what we have today “broken”, it's
> merely overly complicated.  And replacing that with a straight buffer
> from getrandom does not change the external interface, so we can do this
> any time we want.

Whether you use chacha20 in a fast key erasure construction, or you
buffer lots of bytes of getrandom() that you overwrite with zeros as you
use doesn't really matter in the sense that these are both just forms of
buffering. With the chacha20 one, you're reseeding after 16 megs, but of
course the state is smaller, but that doesn't matter. For purposes here,
we may as well treat that as buffering 16 megs of getrandom() output. My
concern with this buffering is that userspace doesn't know when to
invalidate the buffer. So a userspace that's using arc4random() for
crypto will potentially be missing something *important* that a
userspace who used getrandom() instead would have.

When I brought this up with Adhemerval, his reply was that it doesn't
matter anyway because arc4random() is going to be documented as not for
cryptography. So it sounded like the author of it finds it worse too. So
yikes.

The whole point is that you shouldn't ship something sensitive that is
worse than what it will potentially replace, right out of the gate. Slow
down and get the thing right, and then ship it.

> Not completely, no, but we can cover many cases.  I do not currently see
> a way around that if we want to promote arc4random_uniform(limit) as a
> replacement for random() % limit.

I agree that the rejection sampling is the most useful function being
added. Let's say, just for the sake of argument, that you instead added
`getrandom_u{64,32,16,8}_uniform(u_type limit, unsigned long flags)`
that expanded to doing `getrandom(&integer, flags)` and then rejection
sampling on that in a loop like usual. It wouldn't be super great, so
the first optimization would be to observe that the cost of 32 bytes and
the cost of 4 bytes is the same, so you just grab 32 bytes at a time,
which basically guarantees you'll get a good number when rejection
sampling.  Alright, fine, but then maybe you want to use it for
shuffling, and then we have your syscall overhead measurements. But
that's where the vDSO approach comes into play for making it fast. Old
systems would have something work that's still safe. New systems would
have something work that's safe and fast. Nobody gets something less
safe. (As a sidenote, notice how my hypothetical API gives larger types
than arc4random_uniform's fixed u32, just sayin'.)

Now, spitballing new APIs is kind of besides the point here, as there
are 100 different ways to bikeshed that, but what I'm trying to suggest
is that there's a way of adding what you want to libc without reducing
the quality of it for users, right from the beginning. So why not start
out conservatively?

Or, if you insist on providing these functions t o d a y, and won't heed
my warnings about designing the APIs alongside the implementations, then
just make them thin wrappers over getrandom(0) *without* doing fancy
buffering, and then optimizations later can improve it. That would be
the incremental approach, which wouldn't harm potential users. It also
wouldn't shut the door on doing the buffering: if the kernel
optimization improvements go nowhere, and you decide it's a lost cause,
you can always change the way it works later, and make that decision
then.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 13:43         ` Jason A. Donenfeld
@ 2022-07-25 13:58           ` Cristian Rodríguez
  2022-07-25 16:06           ` Rich Felker
  1 sibling, 0 replies; 81+ messages in thread
From: Cristian Rodríguez @ 2022-07-25 13:58 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Florian Weimer, Yann Droneaud, jann,
	Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael

On Mon, Jul 25, 2022 at 9:44 AM Jason A. Donenfeld via Libc-alpha
<libc-alpha@sourceware.org> wrote:

> Or, if you insist on providing these functions t o d a y, and won't heed
> my warnings about designing the APIs alongside the implementations, then
> just make them thin wrappers over getrandom(0) *without* doing fancy
> buffering, and then optimizations later can improve it. That would be
> the incremental approach, which wouldn't harm potential users. It also
> wouldn't shut the door on doing the buffering: if the kernel
> optimization improvements go nowhere, and you decide it's a lost cause,
> you can always change the way it works later, and make that decision
> then.

My 2CLP here if that matters..I agree with this sentiment/approach.
provide this functions for source compat which all juist call
getrandom and abort on failure *for now*
and then  a future iteration can have something done about the syscall
overhead with kernel help.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 13:43         ` Jason A. Donenfeld
  2022-07-25 13:58           ` Cristian Rodríguez
@ 2022-07-25 16:06           ` Rich Felker
  2022-07-25 16:43             ` Florian Weimer
  1 sibling, 1 reply; 81+ messages in thread
From: Rich Felker @ 2022-07-25 16:06 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Florian Weimer, Yann Droneaud, jann,
	Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael

On Mon, Jul 25, 2022 at 03:43:57PM +0200, Jason A. Donenfeld via Libc-alpha wrote:
> Hi Florian,
> 
> On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote:
> > Below you suggest to use GRND_INSECURE to avoid deadlocks during
> > booting.  It's documented in the UAPI header as “Return
> > non-cryptographic random bytes”.  I assume it's broadly equivalent to
> > reading from /dev/urandom (which we need to support for backwards
> > compatibility, and currently use to avoid blocking).  This means that we
> > cannot really document the resulting bits as cryptographically strong
> > from an application perspective because the kernel is not willing to
> > make this commitment.
> > Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
> > as I wrote above, it's UAPI documentation is a bit scary.  Maybe it
> > would be possible to clarify this in the manual pages a bit?  I *assume*
> > that if we are willing to read from /dev/urandom, we can use
> > GRND_INSECURE right away to avoid that fallback path on sufficiently new
> > kernels.  But it would be nice to have confirmation.
> 
> getrandom(GRND_INSECURE) is the same as getrandom(0), except before the
> RNG is seeded, in which case the former will return ~garbage randomness
> while the latter will block. The only current difference between
> getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a
> second to do the jitter entropy thing if the RNG isn't seeded yet.
> 
> I agree that the documentation around this is really bad. Actually, so
> much of the documentation is out of date or confusing. Thanks for the
> kick on this: I really do need to rewrite that / clean it up.
> 
> So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will
> return the same "quality" randomness as getrandom(0), except before
> the RNG is initialized. I'll fix up the docs for that, but feel free to
> refer to this statement ahead of that if you need.
> 
> Code-wise, the only relevant branch related to GRND_INSECURE is:
> 
> 	if (!crng_ready() && !(flags & GRND_INSECURE)) {
> 		if (flags & GRND_NONBLOCK)
> 			return -EAGAIN;
> 		ret = wait_for_random_bytes();
> 		if (unlikely(ret))
> 			return ret;
> 	}
> 
> That means: if it's not ready, and you didn't pass _INSECURE, and you
> didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out
> if that's interrupted by a signal. Other than that one block, it
> continues on to do the same thing as getrandom(0).
> 
> With that said, however, I think it'd be nice if you used only blocking
> randomness, and shove the initialization problem at init systems and
> bootloaders and such. In 5.20, for example, there'll be an x86 boot
> protocol for GRUB and kexec and hypervisors and such to pass a seed, and
> since a long time, there exists a device tree attribute for the same.
> Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for
> the ecosystem at large. And I'm in general interest in seeing progress
> on decades long initialization-time seeding concerns.

arc4random's contract is supposed to be that it always succeeds and
always produces cryptographic output. It cannot use GRND_INSECURE or
other insecure fallback methods to avoid blocking. It has to block.
This function (inherently, in its contract) is not usable for early
boot stuff where one is pretending to want actual cryptographic
entropy but is just as happy getting some "high quality" non-CS stuff,
and thereby would be just as happy with rand() or likely even with
"42". Programs that will run in that context on Linux need to be
explicitly aware of the messy "early boot" situation and figure out
how they're going to handle it securely or if they even wanted CS
randomness to begin with. Fortunately virtually nothing has to do
that. On most (non-embedded) systems, init can just bring up a rw
filesystem with saved entropy on it early and load that, then provide
a fully-working environment to programs it invokes.

> > I had some patches with AT_RANDOM fallback, including overwriting
> > AT_RANDOM with output from the seeded PRNG.  It's certainly messy.  I
> > probably didn't bother to post these patches given how bizarre the whole
> > thing was.  I did have fallback to CPU instructions, but that turned out
> > to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
> > unclear).
> 
> Yea, it's kind of tricky as other things might be using AT_RANDOM also
> and then you have a whole race issue and domain separation and whatnot.
> The thing in systemd isn't really good for crypto -- no forward secrecy
> and such -- but it's ostensibly better than random().

AT_RANDOM is unusable as a fallback here because it's equivalent to
GRND_INSECURE. It's silently broken at early boot time. In musl we're
likely going to end up using the legacy SYS_sysctl on pre-getrandom
kernels even though it spammed syslog just because it seems to be the
only way to get blocking secure entropy on those kernels.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 16:06           ` Rich Felker
@ 2022-07-25 16:43             ` Florian Weimer
  0 siblings, 0 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-25 16:43 UTC (permalink / raw)
  To: Rich Felker
  Cc: Jason A. Donenfeld, Yann Droneaud, jann,
	Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael

* Rich Felker:

> AT_RANDOM is unusable as a fallback here because it's equivalent to
> GRND_INSECURE. It's silently broken at early boot time. In musl we're
> likely going to end up using the legacy SYS_sysctl on pre-getrandom
> kernels even though it spammed syslog just because it seems to be the
> only way to get blocking secure entropy on those kernels.

Even pre-getrandom, sysctl was rarely enabled in kernel configurations
if I recall correctly.  I doubt it is an option to avoid process
termination with old kernels/seccomp filters.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?)
  2022-07-25 12:39       ` Florian Weimer
  2022-07-25 13:43         ` Jason A. Donenfeld
@ 2022-07-26 14:27         ` Yann Droneaud
  2022-07-26 14:35         ` arc4random - are you sure we want these? Yann Droneaud
  2 siblings, 0 replies; 81+ messages in thread
From: Yann Droneaud @ 2022-07-26 14:27 UTC (permalink / raw)
  To: Florian Weimer, Jason A. Donenfeld via Libc-alpha
  Cc: Jason A. Donenfeld, Yann Droneaud, Michael, linux-crypto, jann, dalias

Hi,

Le 25/07/2022 à 14:39, Florian Weimer a écrit :

> * Jason A. Donenfeld via Libc-alpha:

>>   (After all, I didn't see any wild-n-crazy fallback
>> to AT_RANDOM like what systemd does with random-util.c:
>> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )
> I had some patches with AT_RANDOM fallback, including overwriting
> AT_RANDOM with output from the seeded PRNG.  It's certainly messy.  I
> probably didn't bother to post these patches given how bizarre the whole
> thing was.

It's not that bizarre as I have some patches too: I tried to harden the 
way stack_chk_guard and pointer_chk_guard were computed.
Those values are currently generated from slices of AT_RANDOM by the loader.

But I've seen in the wild program reusing AT_RANDOM, thus possibily 
leaking stack_chk_guard and pointer_chk_guard values.

Having a proper (CS)PRNG in the loader, initialized from AT_RANDOM, that 
overwrites AT_RANDOM (with fresh entropy if possible) after 
initialization, would improve programs abusing AT_RANDOM purpose.

Regards.

-- 

Yann Droneaud

OPTEYA

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 12:39       ` Florian Weimer
  2022-07-25 13:43         ` Jason A. Donenfeld
  2022-07-26 14:27         ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud
@ 2022-07-26 14:35         ` Yann Droneaud
  2 siblings, 0 replies; 81+ messages in thread
From: Yann Droneaud @ 2022-07-26 14:35 UTC (permalink / raw)
  To: Florian Weimer, Jason A. Donenfeld via Libc-alpha
  Cc: Jason A. Donenfeld, Michael, linux-crypto, jann

Hi,

Le 25/07/2022 à 14:39, Florian Weimer a écrit :
> * Jason A. Donenfeld via Libc-alpha:
>>> The performance numbers suggest that we benefit from buffering in user
>>> space.
>> The question is whether it's safe and advisable to buffer this way in
>> userspace. Does userspace have the right information now of when to
>> discard the buffer and get a new one? I suspect it does not.
> Not completely, no, but we can cover many cases.  I do not currently see
> a way around that if we want to promote arc4random_uniform(limit) as a
> replacement for random() % limit.

+1

That the reason I've reviewed the implementation positively: for me 
arc4random is not about generating secret keys but small integers.
I want to be able to divert developers from
     srand(time(NULL))
     identifier = rand() % 33
to
     identifier = arc4random_uniform(33)

Safe, fast, and reasonably secure.


Regards.


-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 11:04     ` Jason A. Donenfeld
  2022-07-25 12:39       ` Florian Weimer
@ 2022-07-25 13:25       ` Jeffrey Walton
  2022-07-25 13:48         ` Jason A. Donenfeld
  1 sibling, 1 reply; 81+ messages in thread
From: Jeffrey Walton @ 2022-07-25 13:25 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Linux Crypto Mailing List

On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>  ...
> > The performance numbers suggest that we benefit from buffering in user
> > space.
>
> The question is whether it's safe and advisable to buffer this way in
> userspace. Does userspace have the right information now of when to
> discard the buffer and get a new one? I suspect it does not.

I _think_ the sharp edge on userspace buffering is generator state.
Most generator threat models I have seen assume the attacker does not
know the generator's state. If buffering occurs in the application,
then it may be easier for an attacker to learn of the generator's
state. If buffering occurs in the kernel, then generator state should
be private from an userspace application's view.

Jeff

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 13:25       ` Jeffrey Walton
@ 2022-07-25 13:48         ` Jason A. Donenfeld
  0 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 13:48 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: Linux Crypto Mailing List, libc-alpha

Hi Jeffrey,

Please keep libc-alpha@sourceware.org CC'd.

On Mon, Jul 25, 2022 at 09:25:58AM -0400, Jeffrey Walton wrote:
> On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >  ...
> > > The performance numbers suggest that we benefit from buffering in user
> > > space.
> >
> > The question is whether it's safe and advisable to buffer this way in
> > userspace. Does userspace have the right information now of when to
> > discard the buffer and get a new one? I suspect it does not.
> 
> I _think_ the sharp edge on userspace buffering is generator state.
> Most generator threat models I have seen assume the attacker does not
> know the generator's state. If buffering occurs in the application,
> then it may be easier for an attacker to learn of the generator's
> state. If buffering occurs in the kernel, then generator state should
> be private from an userspace application's view.

I guess that's one concern, if you're worried about heartbleed-like
attacks, in which an undetected RNG state compromise might be easier to
pull off.

What I have in mind, though, are the various triggers and heuristics
that the kernel uses for when it needs to reseed. These userspace
doesn't know about.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: arc4random - are you sure we want these?
  2022-07-25 10:11   ` Florian Weimer
  2022-07-25 11:04     ` Jason A. Donenfeld
@ 2022-07-25 14:56     ` Rich Felker
  1 sibling, 0 replies; 81+ messages in thread
From: Rich Felker @ 2022-07-25 14:56 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jason A. Donenfeld via Libc-alpha, Yann Droneaud, jann,
	Jason A. Donenfeld, Michael

On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer via Libc-alpha wrote:
> * Jason A. Donenfeld via Libc-alpha:
> 
> > I really wonder whether this is a good idea, whether this is something
> > that glibc wants, and whether it's a design worth committing to in the
> > long term.
> 
> Do you object to the interface, or the implementation?

That was *exactly* my first question too.

> The implementation can be improved easily enough at a later date.
> 
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
> 
> getrandom may be fast for bulk generation.  It's not that great for
> generating a few bits here and there.  For example, shuffling a
> 1,000-element array takes 18 microseconds with arc4random_uniform in
> glibc, and 255 microseconds with the naïve getrandom-based
> implementation (with slightly biased results; measured on an Intel
> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).
> 
> > You miss out on this with arc4random, and if that information _is_ to be
> > exported to userspace somehow in the future, it would be awfully nice to
> > design the userspace interface alongside the kernel one.
> 
> What is the kernel interface you are talking about?  From an interface
> standpoint, arc4random_buf and getrandom are very similar, with the main
> difference is that arc4random_buf cannot report failure (except by
> terminating the process).
> 
> > Seen from this perspective, going with OpenBSD's older paradigm might be
> > rather limiting. Why not work together, between the kernel and libc, to
> > see if we can come up with something better, before settling on an
> > interface with semantics that are hard to walk back later?
> 
> Historically, kernel developers were not interested in solving some of
> the hard problems (especially early seeding) that prevent the use of
> getrandom during early userspace stages.
> 
> > As-is, it's hard to recommend that anybody really use these functions.
> > Just keep using getrandom(2), which has mostly favorable semantics.
> 
> Some applications still need to run in configurations where getrandom is
> not available (either because the kernel is too old, or because it has
> been disabled via seccomp).
> 
> > Yes, I get it: it's fun to make a random number generator, and so lots
> > of projects figure out some way to make yet another one somewhere
> > somehow. But the tendency to do so feels like a weird computer tinkerer
> > disease rather something that has ever helped the overall ecosystem.
> 
> The performance numbers suggest that we benefit from buffering in user
> space.  It might not be necessary to implement expansion in userspace.
> getrandom (or /dev/urandom) with a moderately-sized buffer could be
> sufficient.

FWIW I'd rather have a few kB of shareable entropy-expansion .text in
userspace than a few kB per process (or even per thread? >_<) of
nonshareable data any day.

> But that's an implementation detail, and something we can revisit later.
> If we vDSO acceleration for getrandom (maybe using the userspace
> thread-specific data donation we discussed for rseq), we might
> eventually do way with the buffering in glibc.  Again this is an
> implementation detail we can change easily enough.

Exactly.

FWIW I've been kinda waiting to see what glibc would do on this after
the posix_random proposal failed, before considering much what we
should do in musl, but the value I see in either is not as an
optimization but as honoring a well-known interface so we have fewer
applications doing their own stupid YOLO stuff trying to get secure
entropy and botching it. So far the best we have is getentropy but it
fails on old kernels. At some point musl will probably implement both
arc4random and getentropy with secure fallback process for old
kernels -- certainly the fallback is needed for meeting the arc4random
contract and I'd like it on both places.

Rich

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH] arc4random: simplify design for better safety
  2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
                     ` (3 preceding siblings ...)
  2022-07-25 10:11   ` Florian Weimer
@ 2022-07-25 22:57   ` Jason A. Donenfeld
  2022-07-25 23:11     ` Jason A. Donenfeld
                       ` (2 more replies)
  4 siblings, 3 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 22:57 UTC (permalink / raw)
  To: libc-alpha
  Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, but it's the best approximation
we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 include/stdlib.h                              |   3 -
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 205 ++-----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/chacha20_arch.h               |  24 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 28 files changed, 52 insertions(+), 2663 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/generic/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..23a4167987 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,61 +16,14 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <sys/poll.h>
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +31,67 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
-
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
+  static bool have_getrandom = true, seen_initialized = false;
+  int fd;
 
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
-	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
-	}
-      while (p < end);
-
-      if (__close_nocancel (fd) == 0)
-	return;
+      ssize_t l;
+
+      if (!have_getrandom)
+        break;
+
+      l = __getrandom_nocancel (p, n, 0);
+      if (l > 0)
+        {
+          if ((size_t) l == n)
+              return; /* Done reading, success. */
+          p = (uint8_t *) p + l;
+          n -= l;
+          continue; /* Interrupted by a signal; keep going. */
+        }
+      else if (l == 0)
+        arc4random_getrandom_failure (); /* Weird, should never happen. */
+      else if (errno == ENOSYS)
+        {
+          have_getrandom = false;
+          break; /* No syscall, so fallback to /dev/urandom. */
+        }
+      arc4random_getrandom_failure (); /* Unknown other error, should never happen. */
     }
-  arc4random_getrandom_failure ();
-}
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
+  if (!seen_initialized)
     {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+        arc4random_getrandom_failure ();
+      if (__poll(&pfd, 1, -1) < 0)
+        arc4random_getrandom_failure ();
+      if (__close_nocancel(pfd.fd) < 0)
+        arc4random_getrandom_failure ();
+      seen_initialized = true;
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (buffer, len);
-      return;
-    }
-
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY);
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  while (n)
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+        arc4random_getrandom_failure ();
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +100,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf(&r, sizeof(r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] arc4random: simplify design for better safety
  2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
@ 2022-07-25 23:11     ` Jason A. Donenfeld
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
  2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
  2 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 23:11 UTC (permalink / raw)
  To: libc-alpha
  Cc: Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, linux-crypto

If you're just following along on the mailing list, without actively
trying to apply this to a glibc tree, that diff might be hard to read.
The meat of it is the below function implementation. Notably this is
basically the same as systemd's crypto_random_bytes() (which I recently
rewrote there).

void
__arc4random_buf (void *p, size_t n)
{
  static bool have_getrandom = true, seen_initialized = false;
  int fd;

  if (n == 0)
    return;

  for (;;)
    {
      ssize_t l;

      if (!have_getrandom)
        break;

      l = __getrandom_nocancel (p, n, 0);
      if (l > 0)
        {
          if ((size_t) l == n)
              return; /* Done reading, success. */
          p = (uint8_t *) p + l;
          n -= l;
          continue; /* Interrupted by a signal; keep going. */
        }
      else if (l == 0)
        arc4random_getrandom_failure (); /* Weird, should never happen. */
      else if (errno == ENOSYS)
        {
          have_getrandom = false;
          break; /* No syscall, so fallback to /dev/urandom. */
        }
      arc4random_getrandom_failure (); /* Unknown other error, should never happen. */
    }

  if (!seen_initialized)
    {
      struct pollfd pfd = { .events = POLLIN };
      pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
      if (pfd.fd < 0)
        arc4random_getrandom_failure ();
      if (__poll(&pfd, 1, -1) < 0)
        arc4random_getrandom_failure ();
      if (__close_nocancel(pfd.fd) < 0)
        arc4random_getrandom_failure ();
      seen_initialized = true;
    }

  fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY);
  if (fd < 0)
    arc4random_getrandom_failure ();
  while (n)
    {
      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
      if (l <= 0)
        arc4random_getrandom_failure ();
      p = (uint8_t *) p + l;
      n -= l;
    }
  if (__close_nocancel (fd) < 0)
    arc4random_getrandom_failure ();
}
libc_hidden_def (__arc4random_buf)
weak_alias (__arc4random_buf, arc4random_buf)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
  2022-07-25 23:11     ` Jason A. Donenfeld
@ 2022-07-25 23:28     ` Jason A. Donenfeld
  2022-07-25 23:59       ` Eric Biggers
                         ` (3 more replies)
  2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
  2 siblings, 4 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 23:28 UTC (permalink / raw)
  To: libc-alpha
  Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, but it's the best approximation
we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 include/stdlib.h                              |   3 -
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 204 ++-----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/chacha20_arch.h               |  24 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 28 files changed, 53 insertions(+), 2661 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/generic/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..80c55cde63 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,61 +16,14 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <sys/poll.h>
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +31,70 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
+  static bool have_getrandom = true, seen_initialized = false;
+  int fd;
 
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
-
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
-	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
-	}
-      while (p < end);
+      ssize_t l;
 
-      if (__close_nocancel (fd) == 0)
-	return;
-    }
-  arc4random_getrandom_failure ();
-}
+      if (!have_getrandom)
+	break;
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
-    {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      l = __getrandom_nocancel (p, n, 0);
+      if (l > 0)
+	{
+	  if ((size_t) l == n)
+	    return; /* Done reading, success. */
+	  p = (uint8_t *) p + l;
+	  n -= l;
+	  continue; /* Interrupted by a signal; keep going. */
+	}
+      else if (l == 0)
+	arc4random_getrandom_failure (); /* Weird, should never happen. */
+      else if (errno == ENOSYS)
+	{
+	  have_getrandom = false;
+	  break; /* No syscall, so fallback to /dev/urandom. */
+	}
+      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
+  if (!seen_initialized)
     {
-      arc4random_getentropy (buffer, len);
-      return;
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (
+	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+	arc4random_getrandom_failure ();
+      if (__poll (&pfd, 1, -1) < 0)
+	arc4random_getrandom_failure ();
+      if (__close_nocancel (pfd.fd) < 0)
+	arc4random_getrandom_failure ();
+      seen_initialized = true;
     }
 
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = TEMP_FAILURE_RETRY (
+      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  do
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+	arc4random_getrandom_failure ();
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  while (n);
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +103,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf (&r, sizeof (r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
@ 2022-07-25 23:59       ` Eric Biggers
  2022-07-26 10:26         ` Jason A. Donenfeld
  2022-07-26  1:10       ` Mark Harris
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: Eric Biggers @ 2022-07-25 23:59 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, linux-crypto

On Tue, Jul 26, 2022 at 01:28:10AM +0200, Jason A. Donenfeld wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
> 
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
> 
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
> 
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
> 
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, but it's the best approximation
> we can do.
> 
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.
> 
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

This looks good to me.

There are still a few bits that need to be removed/updated.  With a quick grep,
I found:

sysdeps/generic/tls-internal-struct.h:  struct arc4random_state_t *rand_state;

sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork.  *

NEWS: ... The functions use a pseudo-random number generator along with
NEWS: entropy from the kernel.


Also, the documentation in manual/math.texi should say that the randomness is
cryptographically secure.

- Eric

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 23:59       ` Eric Biggers
@ 2022-07-26 10:26         ` Jason A. Donenfeld
  0 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 10:26 UTC (permalink / raw)
  To: Eric Biggers
  Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, linux-crypto

Hi Eric,

On Mon, Jul 25, 2022 at 04:59:17PM -0700, Eric Biggers wrote:
> This looks good to me.
> 
> There are still a few bits that need to be removed/updated.  With a quick grep,
> I found:
> 
> sysdeps/generic/tls-internal-struct.h:  struct arc4random_state_t *rand_state;
> 
> sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork.  *
> 
> NEWS: ... The functions use a pseudo-random number generator along with
> NEWS: entropy from the kernel.
> 
> 
> Also, the documentation in manual/math.texi should say that the randomness is
> cryptographically secure.

Thanks for the notes. I'll clean that all up in v3.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
  2022-07-25 23:59       ` Eric Biggers
@ 2022-07-26  1:10       ` Mark Harris
  2022-07-26 10:41         ` Jason A. Donenfeld
  2022-07-26  9:55       ` Florian Weimer
  2022-07-26 11:33       ` Adhemerval Zanella Netto
  3 siblings, 1 reply; 81+ messages in thread
From: Mark Harris @ 2022-07-26  1:10 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: libc-alpha, Florian Weimer, linux-crypto

Jason A. Donenfeld wrote:
> +      l = __getrandom_nocancel (p, n, 0);
> +      if (l > 0)
> +       {
> +         if ((size_t) l == n)
> +           return; /* Done reading, success. */
> +         p = (uint8_t *) p + l;
> +         n -= l;
> +         continue; /* Interrupted by a signal; keep going. */
> +       }
> +      else if (l == 0)
> +       arc4random_getrandom_failure (); /* Weird, should never happen. */
> +      else if (errno == ENOSYS)
> +       {
> +         have_getrandom = false;
> +         break; /* No syscall, so fallback to /dev/urandom. */
> +       }
> +      arc4random_getrandom_failure (); /* Unknown error, should never happen. */

Isn't EINTR also possible?  Aborting in that case does not seem reasonable.

Also the __getrandom_nocancel function does not set errno on Linux; it
just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
So unless that is changed, it doesn't look like this ENOSYS check will
detect old Linux kernels.

> +      struct pollfd pfd = { .events = POLLIN };
> +      pfd.fd = TEMP_FAILURE_RETRY (
> +         __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +      if (pfd.fd < 0)
> +       arc4random_getrandom_failure ();
> +      if (__poll (&pfd, 1, -1) < 0)
> +       arc4random_getrandom_failure ();
> +      if (__close_nocancel (pfd.fd) < 0)
> +       arc4random_getrandom_failure ();

The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also
result in EINTR.


 - Mark

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26  1:10       ` Mark Harris
@ 2022-07-26 10:41         ` Jason A. Donenfeld
  2022-07-26 11:06           ` Florian Weimer
  2022-07-26 16:51           ` Mark Harris
  0 siblings, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 10:41 UTC (permalink / raw)
  To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto

Hi Mark,

On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote:
> Jason A. Donenfeld wrote:
> > +      l = __getrandom_nocancel (p, n, 0);
> > +      if (l > 0)
> > +       {
> > +         if ((size_t) l == n)
> > +           return; /* Done reading, success. */
> > +         p = (uint8_t *) p + l;
> > +         n -= l;
> > +         continue; /* Interrupted by a signal; keep going. */
> > +       }
> > +      else if (l == 0)
> > +       arc4random_getrandom_failure (); /* Weird, should never happen. */
> > +      else if (errno == ENOSYS)
> > +       {
> > +         have_getrandom = false;
> > +         break; /* No syscall, so fallback to /dev/urandom. */
> > +       }
> > +      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
> 
> Isn't EINTR also possible?  Aborting in that case does not seem reasonable.

Not in current kernels, where it always returns at least PAGE_SIZE bytes
before checking for pending signals. In older kernels, if there was a
signal pending at the top, it would do no work and return -ERESTARTSYS,
which I believe should then get restarted by glibc's syscaller? I might
be wrong about how restarts work though, so if you know better, please
let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we
want. I guess I can just add a case for it.

> Also the __getrandom_nocancel function does not set errno on Linux; it
> just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
> So unless that is changed, it doesn't look like this ENOSYS check will
> detect old Linux kernels.

Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
as a return value, right? I'll adjust the code to account for that.

> > +      struct pollfd pfd = { .events = POLLIN };
> > +      pfd.fd = TEMP_FAILURE_RETRY (
> > +         __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> > +      if (pfd.fd < 0)
> > +       arc4random_getrandom_failure ();
> > +      if (__poll (&pfd, 1, -1) < 0)
> > +       arc4random_getrandom_failure ();
> > +      if (__close_nocancel (pfd.fd) < 0)
> > +       arc4random_getrandom_failure ();
> 
> The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also
> result in EINTR.

Thanks. I'll surround the __poll in TEMP_FAILURE_RETRY.

Thank you for the review! v3 will have the above changes.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 10:41         ` Jason A. Donenfeld
@ 2022-07-26 11:06           ` Florian Weimer
  2022-07-26 16:51           ` Mark Harris
  1 sibling, 0 replies; 81+ messages in thread
From: Florian Weimer @ 2022-07-26 11:06 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Mark Harris, libc-alpha, linux-crypto

* Jason A. Donenfeld:

> Not in current kernels, where it always returns at least PAGE_SIZE bytes
> before checking for pending signals. In older kernels, if there was a
> signal pending at the top, it would do no work and return -ERESTARTSYS,
> which I believe should then get restarted by glibc's syscaller?

glibc does not handle ERESTARTSYS, it's a kernel-internal error code
that's not exported in UAPI headers and must not leak to userspace
(except perhaps via ptrace).  I believe restarts are handled in the
kernel signal code, by tweaking the program counter.  Looking at that,
ERESTARTSYS gets translated to EINTR for !SA_RESTART system calls:

        /* Are we from a system call? */
        if (syscall_get_nr(current, regs) != -1) {
                /* If so, check system call restarting.. */
                switch (syscall_get_error(current, regs)) {
                case -ERESTART_RESTARTBLOCK:
                case -ERESTARTNOHAND:
                        regs->ax = -EINTR;
                        break;

                case -ERESTARTSYS:
                        if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
                                regs->ax = -EINTR;
                                break;
                        }
                        fallthrough;
                case -ERESTARTNOINTR:
                        regs->ax = regs->orig_ax;
                        regs->ip -= 2;
                        break;
                }
        }

(arch/x86/kernel/signal.c)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 10:41         ` Jason A. Donenfeld
  2022-07-26 11:06           ` Florian Weimer
@ 2022-07-26 16:51           ` Mark Harris
  2022-07-26 18:42             ` Jason A. Donenfeld
  1 sibling, 1 reply; 81+ messages in thread
From: Mark Harris @ 2022-07-26 16:51 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: libc-alpha, Florian Weimer, linux-crypto

Jason A. Donenfeld wrote:
> On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote:
> > Jason A. Donenfeld wrote:
> > > +      l = __getrandom_nocancel (p, n, 0);
> > > +      if (l > 0)
> > > +       {
> > > +         if ((size_t) l == n)
> > > +           return; /* Done reading, success. */
> > > +         p = (uint8_t *) p + l;
> > > +         n -= l;
> > > +         continue; /* Interrupted by a signal; keep going. */
> > > +       }
> > > +      else if (l == 0)
> > > +       arc4random_getrandom_failure (); /* Weird, should never happen. */
> > > +      else if (errno == ENOSYS)
> > > +       {
> > > +         have_getrandom = false;
> > > +         break; /* No syscall, so fallback to /dev/urandom. */
> > > +       }
> > > +      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
> >
> > Isn't EINTR also possible?  Aborting in that case does not seem reasonable.
>
> Not in current kernels, where it always returns at least PAGE_SIZE bytes
> before checking for pending signals. In older kernels, if there was a
> signal pending at the top, it would do no work and return -ERESTARTSYS,
> which I believe should then get restarted by glibc's syscaller? I might
> be wrong about how restarts work though, so if you know better, please
> let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we
> want. I guess I can just add a case for it.
>
> > Also the __getrandom_nocancel function does not set errno on Linux; it
> > just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags).
> > So unless that is changed, it doesn't look like this ENOSYS check will
> > detect old Linux kernels.
>
> Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> as a return value, right? I'll adjust the code to account for that.

Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
gets from the Linux kernel, but only on Linux does
__getrandom_nocancel use that.  The Hurd and generic implementations
set errno on error.  Previously the only call to this function did not
care about the specific error value so it didn't matter.  Since you
are now using the error value in generic code, __getrandom_nocancel
should be changed on Linux to set errno like most other _nocancel
calls, and then it should go back to checking errno here.

And as Adhemerval mentioned, you only added a Linux implementation of
__ppoll_infinity_nocancel, but are calling it from generic code.

Also, by the way your patches cc'd directly to me get quarantined
because DKIM signature verification failed.  The non-patch messages
pass DKIM and are fine.



 - Mark

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 16:51           ` Mark Harris
@ 2022-07-26 18:42             ` Jason A. Donenfeld
  2022-07-26 19:18               ` Adhemerval Zanella Netto
  2022-07-26 19:24               ` Jason A. Donenfeld
  0 siblings, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 18:42 UTC (permalink / raw)
  To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto

Hi Mark,

On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote:
> > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> > as a return value, right? I'll adjust the code to account for that.
> 
> Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
> gets from the Linux kernel, but only on Linux does
> __getrandom_nocancel use that.  The Hurd and generic implementations
> set errno on error.  Previously the only call to this function did not
> care about the specific error value so it didn't matter.  Since you
> are now using the error value in generic code, __getrandom_nocancel
> should be changed on Linux to set errno like most other _nocancel
> calls, and then it should go back to checking errno here.
> 
> And as Adhemerval mentioned, you only added a Linux implementation of
> __ppoll_infinity_nocancel, but are calling it from generic code.

Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets
errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR.

> Also, by the way your patches cc'd directly to me get quarantined
> because DKIM signature verification failed.  The non-patch messages
> pass DKIM and are fine.

That sure is odd. The emails are all going through the MTA. rspamd bug?
OpenSMTPD bug? Hmm...

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 18:42             ` Jason A. Donenfeld
@ 2022-07-26 19:18               ` Adhemerval Zanella Netto
  2022-07-26 19:24               ` Jason A. Donenfeld
  1 sibling, 0 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 19:18 UTC (permalink / raw)
  To: libc-alpha



On 26/07/22 15:42, Jason A. Donenfeld via Libc-alpha wrote:
> Hi Mark,
> 
> On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote:
>>> Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
>>> as a return value, right? I'll adjust the code to account for that.
>>
>> Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
>> gets from the Linux kernel, but only on Linux does
>> __getrandom_nocancel use that.  The Hurd and generic implementations
>> set errno on error.  Previously the only call to this function did not
>> care about the specific error value so it didn't matter.  Since you
>> are now using the error value in generic code, __getrandom_nocancel
>> should be changed on Linux to set errno like most other _nocancel
>> calls, and then it should go back to checking errno here.
>>
>> And as Adhemerval mentioned, you only added a Linux implementation of
>> __ppoll_infinity_nocancel, but are calling it from generic code.
> 
> Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets
> errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR.
> 
>> Also, by the way your patches cc'd directly to me get quarantined
>> because DKIM signature verification failed.  The non-patch messages
>> pass DKIM and are fine.
> 
> That sure is odd. The emails are all going through the MTA. rspamd bug?
> OpenSMTPD bug? Hmm...

I am having a similar issue, where my company email server (which is google
in the end) is marking your patches as spam.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 18:42             ` Jason A. Donenfeld
  2022-07-26 19:18               ` Adhemerval Zanella Netto
@ 2022-07-26 19:24               ` Jason A. Donenfeld
  1 sibling, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 19:24 UTC (permalink / raw)
  To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto

On Tue, Jul 26, 2022 at 08:42:51PM +0200, Jason A. Donenfeld wrote:
> Hi Mark,
> 
> On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote:
> > > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is
> > > as a return value, right? I'll adjust the code to account for that.
> > 
> > Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it
> > gets from the Linux kernel, but only on Linux does
> > __getrandom_nocancel use that.  The Hurd and generic implementations
> > set errno on error.  Previously the only call to this function did not
> > care about the specific error value so it didn't matter.  Since you
> > are now using the error value in generic code, __getrandom_nocancel
> > should be changed on Linux to set errno like most other _nocancel
> > calls, and then it should go back to checking errno here.
> > 
> > And as Adhemerval mentioned, you only added a Linux implementation of
> > __ppoll_infinity_nocancel, but are calling it from generic code.
> 
> Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets
> errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR.
> 
> > Also, by the way your patches cc'd directly to me get quarantined
> > because DKIM signature verification failed.  The non-patch messages
> > pass DKIM and are fine.
> 
> That sure is odd. The emails are all going through the MTA. rspamd bug?
> OpenSMTPD bug? Hmm...

It's because LICENSE has a ^L in it, which I guess doesn't go over well
with OpenSMPTD or rspamd or kernel.org's smtp server or some combination
thereof...

I just posted v5, by the way, in case it's in your spam folder.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
  2022-07-25 23:59       ` Eric Biggers
  2022-07-26  1:10       ` Mark Harris
@ 2022-07-26  9:55       ` Florian Weimer
  2022-07-26 11:04         ` Jason A. Donenfeld
  2022-07-26 11:33       ` Adhemerval Zanella Netto
  3 siblings, 1 reply; 81+ messages in thread
From: Florian Weimer @ 2022-07-26  9:55 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez,
	Paul Eggert, linux-crypto

* Jason A. Donenfeld:

> +      pfd.fd = TEMP_FAILURE_RETRY (
> +	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +      if (pfd.fd < 0)
> +	arc4random_getrandom_failure ();
> +      if (__poll (&pfd, 1, -1) < 0)
> +	arc4random_getrandom_failure ();
> +      if (__close_nocancel (pfd.fd) < 0)
> +	arc4random_getrandom_failure ();

What happens if /dev/random is actually /dev/urandom?  Will the poll
call fail?

I think we need a no-cancel variant of poll here, and we also need to
handle EINTR gracefully.

Performance-wise, my 1000 element shuffle benchmark runs about 14 times
slower without userspace buffering.  (For comparison, just removing
ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower
than current master.)  Our random() implementation is quite slow, so
arc4random() as a replacement call is competitive.  The unbuffered
version, not so much.

Running the benchmark, I see 40% of the time spent in chacha_permute in
the kernel, that is really quite odd.  Why doesn't the system call
overhead dominate?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26  9:55       ` Florian Weimer
@ 2022-07-26 11:04         ` Jason A. Donenfeld
  2022-07-26 11:07           ` [PATCH v3] " Jason A. Donenfeld
  2022-07-26 11:12           ` [PATCH v2] " Florian Weimer
  0 siblings, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 11:04 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez,
	Paul Eggert, linux-crypto

Hi Florian,

On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote:
> * Jason A. Donenfeld:
> 
> > +      pfd.fd = TEMP_FAILURE_RETRY (
> > +	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> > +      if (pfd.fd < 0)
> > +	arc4random_getrandom_failure ();
> > +      if (__poll (&pfd, 1, -1) < 0)
> > +	arc4random_getrandom_failure ();
> > +      if (__close_nocancel (pfd.fd) < 0)
> > +	arc4random_getrandom_failure ();
> 
> What happens if /dev/random is actually /dev/urandom?  Will the poll
> call fail?

Yes. I'm unsure if you're asking this because it'd be a nice
simplification to only have to open one fd, or because you're worried
about confusion. I don't think the confusion problem is one we should
take too seriously, but if you're concerned, we can always fstat and
check the maj/min. Seems a bit much, though.

> I think we need a no-cancel variant of poll here, and we also need to
> handle EINTR gracefully.

Thanks for the note about poll nocancel. I'll try to add this. I don't
totally know how to manage that pluming, but I'll give it my best shot.

> Performance-wise, my 1000 element shuffle benchmark runs about 14 times
> slower without userspace buffering.  (For comparison, just removing
> ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower
> than current master.)  Our random() implementation is quite slow, so
> arc4random() as a replacement call is competitive.  The unbuffered
> version, not so much.

Yes, as mentioned, this is slower. But let's get something down first
that's *correct*, and then after we can start optimizing it. Let's not
prematurely optimize and create a problematic function that nobody
should use.

> Running the benchmark, I see 40% of the time spent in chacha_permute in
> the kernel, that is really quite odd.  Why doesn't the system call
> overhead dominate?

Huh, that is interesting. I guess if you're reading 4 bytes for an
integer, it winds up computing a whole chacha block each time, with half
of it doing fast key erasure and half of it being returnable to the
caller. When we later figure out a safer way to buffer, ostensibly this
will go away. But for now, we really should not prematurely optimize.

I'll have v3 out shortly with your suggested fixes.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3] arc4random: simplify design for better safety
  2022-07-26 11:04         ` Jason A. Donenfeld
@ 2022-07-26 11:07           ` Jason A. Donenfeld
  2022-07-26 11:11             ` Jason A. Donenfeld
  2022-07-26 11:12           ` [PATCH v2] " Florian Weimer
  1 sibling, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 11:07 UTC (permalink / raw)
  To: libc-alpha
  Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers,
	linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, but it does to a decent swath of them, so generally
it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 NEWS                                          |   4 +-
 include/stdlib.h                              |   3 -
 io/Versions                                   |   1 +
 manual/math.texi                              |  13 +-
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 206 ++-----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/not-cancel.h                  |   2 +
 sysdeps/generic/tls-internal-struct.h         |   1 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/Makefile              |   3 +-
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/not-cancel.h          |   5 +
 .../sysv/linux/poll_nocancel.c}               |  16 +-
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 37 files changed, 81 insertions(+), 2658 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/poll_nocancel.c} (68%)
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/NEWS b/NEWS
index 8420a65cd0..fe531bfe1e 100644
--- a/NEWS
+++ b/NEWS
@@ -61,8 +61,8 @@ Major new features:
   is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
 
 * The functions arc4random, arc4random_buf, and arc4random_uniform have been
-  added.  The functions use a pseudo-random number generator along with
-  entropy from the kernel.
+  added.  The functions wrap getrandom and/or /dev/urandom to return high-
+  quality randomness from the kernel.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/io/Versions b/io/Versions
index 4e19540885..b8660023e2 100644
--- a/io/Versions
+++ b/io/Versions
@@ -145,6 +145,7 @@ libc {
     __fcntl_nocancel;
     __open64_nocancel;
     __write_nocancel;
+    __poll_nocancel;
     __file_is_unchanged;
     __file_change_detection_for_stat;
     __file_change_detection_for_path;
diff --git a/manual/math.texi b/manual/math.texi
index 141695cc30..6d69bbff66 100644
--- a/manual/math.texi
+++ b/manual/math.texi
@@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
 extension, based on OpenBSD interfaces.
 
 @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
-or by reading @file{/dev/urandom} to seed and periodically re-seed the
-internal state.  A per-thread data pool is used, which allows fast output
-generation.
+or by reading @file{/dev/urandom} to seed.
 
-Although these functions provide higher random quality than ISO, BSD, and
-SVID functions, these still use a Pseudo-Random generator and should not
-be used in cryptographic contexts.
-
-The internal state is cleared and reseeded with kernel entropy on @code{fork}
-and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
-or when using @theglibc{} @code{syscall} function.
+These functions provide higher random quality than ISO, BSD, and SVID
+functions, and may be used in cryptographic contexts.
 
 The prototypes for these functions are in @file{stdlib.h}.
 @pindex stdlib.h
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..ee49c7f551 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,61 +16,14 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <sys/poll.h>
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +31,72 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
+  static bool have_getrandom = true, seen_initialized = false;
+  int fd;
 
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
-
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
-	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
-	}
-      while (p < end);
+      ssize_t l;
 
-      if (__close_nocancel (fd) == 0)
-	return;
-    }
-  arc4random_getrandom_failure ();
-}
+      if (!have_getrandom)
+	break;
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
-    {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      l = __getrandom_nocancel (p, n, 0);
+      if (l > 0)
+	{
+	  if ((size_t) l == n)
+	    return; /* Done reading, success. */
+	  p = (uint8_t *) p + l;
+	  n -= l;
+	  continue; /* Interrupted by a signal; keep going. */
+	}
+      else if (l == 0)
+	arc4random_getrandom_failure (); /* Weird, should never happen. */
+      else if (l == -EINTR)
+	continue; /* Interrupted by a signal; keep going. */
+      else if (l == -ENOSYS)
+	{
+	  have_getrandom = false;
+	  break; /* No syscall, so fallback to /dev/urandom. */
+	}
+      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
+  if (!seen_initialized)
     {
-      arc4random_getentropy (buffer, len);
-      return;
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (
+	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+	arc4random_getrandom_failure ();
+      if (TEMP_FAILURE_RETRY (__poll_nocancel (&pfd, 1, -1)) < 0)
+	arc4random_getrandom_failure ();
+      if (__close_nocancel (pfd.fd) < 0)
+	arc4random_getrandom_failure ();
+      seen_initialized = true;
     }
 
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = TEMP_FAILURE_RETRY (
+      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  do
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+	arc4random_getrandom_failure ();
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  while (n);
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +105,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf (&r, sizeof (r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
index acceb9b67f..bd60643599 100644
--- a/sysdeps/generic/not-cancel.h
+++ b/sysdeps/generic/not-cancel.h
@@ -50,5 +50,7 @@
   __fcntl64 (fd, cmd, __VA_ARGS__)
 #define __getrandom_nocancel(buf, size, flags) \
   __getrandom (buf, size, flags)
+#define __poll_nocancel(fd) \
+  __poll (fd)
 
 #endif /* NOT_CANCEL_H  */
diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
index a91915831b..d76c715a96 100644
--- a/sysdeps/generic/tls-internal-struct.h
+++ b/sysdeps/generic/tls-internal-struct.h
@@ -23,7 +23,6 @@ struct tls_internal_t
 {
   char *strsignal_buf;
   char *strerror_l_buf;
-  struct arc4random_state_t *rand_state;
 };
 
 #endif
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 2ccc92b6b8..db28c65799 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \
 		   open_nocancel open64_nocancel \
 		   openat_nocancel openat64_nocancel \
 		   read_nocancel pread64_nocancel \
-		   write_nocancel statx_cp stat_t64_cp
+		   write_nocancel statx_cp stat_t64_cp \
+		   poll_nocancel
 
 sysdep_headers += bits/fcntl-linux.h
 
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 65d2ceda2c..04c3d37551 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -320,6 +320,7 @@ libc {
     __read_nocancel;
     __pread64_nocancel;
     __close_nocancel;
+    __poll_nocancel;
     __sigtimedwait;
     # functions used by nscd
     __netlink_assert_response;
diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
index 2c58d5ae2f..71361e7e96 100644
--- a/sysdeps/unix/sysv/linux/not-cancel.h
+++ b/sysdeps/unix/sysv/linux/not-cancel.h
@@ -23,6 +23,7 @@
 #include <sysdep.h>
 #include <errno.h>
 #include <unistd.h>
+#include <sys/poll.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -77,6 +78,9 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
 /* Uncancelable fcntl.  */
 __typeof (__fcntl) __fcntl64_nocancel;
 
+/* Uncancelable poll.  */
+__typeof (__poll) __poll_nocancel;
+
 #if IS_IN (libc) || IS_IN (rtld)
 hidden_proto (__open_nocancel)
 hidden_proto (__open64_nocancel)
@@ -87,6 +91,7 @@ hidden_proto (__pread64_nocancel)
 hidden_proto (__write_nocancel)
 hidden_proto (__close_nocancel)
 hidden_proto (__fcntl64_nocancel)
+hidden_proto (__poll_nocancel)
 #endif
 
 #endif /* NOT_CANCEL_H  */
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/poll_nocancel.c
similarity index 68%
rename from sysdeps/generic/chacha20_arch.h
rename to sysdeps/unix/sysv/linux/poll_nocancel.c
index 1b4559ccbc..462e6f8464 100644
--- a/sysdeps/generic/chacha20_arch.h
+++ b/sysdeps/unix/sysv/linux/poll_nocancel.c
@@ -1,5 +1,5 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
+/* Linux poll syscall implementation -- non-cancellable.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -16,9 +16,13 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
+#include <unistd.h>
+#include <sysdep-cancel.h>
+#include <not-cancel.h>
+
+int
+__poll_nocancel (struct pollfd *fds, nfds_t nfds, int timeout)
 {
-  chacha20_crypt_generic (state, dst, src, bytes);
+  return INLINE_SYSCALL_CALL (poll, fds, nfds, timeout);
 }
+hidden_def (__poll_nocancel)
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
index ebc65d896a..2ebe977802 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.h
+++ b/sysdeps/unix/sysv/linux/tls-internal.h
@@ -28,7 +28,6 @@ __glibc_tls_internal (void)
   return &THREAD_SELF->tls_state;
 }
 
-/* Reset the arc4random TCB state on fork.  */
 extern void __glibc_tls_internal_free (void) attribute_hidden;
 
 #endif
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3] arc4random: simplify design for better safety
  2022-07-26 11:07           ` [PATCH v3] " Jason A. Donenfeld
@ 2022-07-26 11:11             ` Jason A. Donenfeld
  0 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 11:11 UTC (permalink / raw)
  To: libc-alpha
  Cc: Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers,
	linux-crypto

As before, I'll paste the main function in question standalone so that
this is a bit easier to read for those not applying this to an actual
tree.

void
__arc4random_buf (void *p, size_t n)
{
  static bool have_getrandom = true, seen_initialized = false;
  int fd;

  if (n == 0)
    return;

  for (;;)
    {
      ssize_t l;

      if (!have_getrandom)
	break;

      l = __getrandom_nocancel (p, n, 0);
      if (l > 0)
	{
	  if ((size_t) l == n)
	    return; /* Done reading, success. */
	  p = (uint8_t *) p + l;
	  n -= l;
	  continue; /* Interrupted by a signal; keep going. */
	}
      else if (l == 0)
	arc4random_getrandom_failure (); /* Weird, should never happen. */
      else if (l == -EINTR)
	continue; /* Interrupted by a signal; keep going. */
      else if (l == -ENOSYS)
	{
	  have_getrandom = false;
	  break; /* No syscall, so fallback to /dev/urandom. */
	}
      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
    }

  if (!seen_initialized)
    {
      struct pollfd pfd = { .events = POLLIN };
      pfd.fd = TEMP_FAILURE_RETRY (
	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
      if (pfd.fd < 0)
	arc4random_getrandom_failure ();
      if (TEMP_FAILURE_RETRY (__poll_nocancel (&pfd, 1, -1)) < 0)
	arc4random_getrandom_failure ();
      if (__close_nocancel (pfd.fd) < 0)
	arc4random_getrandom_failure ();
      seen_initialized = true;
    }

  fd = TEMP_FAILURE_RETRY (
      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
  if (fd < 0)
    arc4random_getrandom_failure ();
  do
    {
      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
      if (l <= 0)
	arc4random_getrandom_failure ();
      p = (uint8_t *) p + l;
      n -= l;
    }
  while (n);
  if (__close_nocancel (fd) < 0)
    arc4random_getrandom_failure ();
}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:04         ` Jason A. Donenfeld
  2022-07-26 11:07           ` [PATCH v3] " Jason A. Donenfeld
@ 2022-07-26 11:12           ` Florian Weimer
  2022-07-26 11:20             ` Jason A. Donenfeld
  1 sibling, 1 reply; 81+ messages in thread
From: Florian Weimer @ 2022-07-26 11:12 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez,
	Paul Eggert, linux-crypto

* Jason A. Donenfeld:

> Hi Florian,
>
> On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote:
>> * Jason A. Donenfeld:
>> 
>> > +      pfd.fd = TEMP_FAILURE_RETRY (
>> > +	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
>> > +      if (pfd.fd < 0)
>> > +	arc4random_getrandom_failure ();
>> > +      if (__poll (&pfd, 1, -1) < 0)
>> > +	arc4random_getrandom_failure ();
>> > +      if (__close_nocancel (pfd.fd) < 0)
>> > +	arc4random_getrandom_failure ();
>> 
>> What happens if /dev/random is actually /dev/urandom?  Will the poll
>> call fail?
>
> Yes. I'm unsure if you're asking this because it'd be a nice
> simplification to only have to open one fd, or because you're worried
> about confusion. I don't think the confusion problem is one we should
> take too seriously, but if you're concerned, we can always fstat and
> check the maj/min. Seems a bit much, though.

Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
to be the only way to get some applications working because they tried
to read from /dev/random at a higher rate than the system was estimating
entropy coming in.  We may have to do something differently here if the
failing poll causes too much breakage.

>> Running the benchmark, I see 40% of the time spent in chacha_permute in
>> the kernel, that is really quite odd.  Why doesn't the system call
>> overhead dominate?
>
> Huh, that is interesting. I guess if you're reading 4 bytes for an
> integer, it winds up computing a whole chacha block each time, with half
> of it doing fast key erasure and half of it being returnable to the
> caller. When we later figure out a safer way to buffer, ostensibly this
> will go away. But for now, we really should not prematurely optimize.

Yeah, I can't really argue against that, given that I said before that I
wasn't too worried about the implementation.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:12           ` [PATCH v2] " Florian Weimer
@ 2022-07-26 11:20             ` Jason A. Donenfeld
  2022-07-26 11:35               ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 11:20 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez,
	Paul Eggert, linux-crypto

Hey Florian,

On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote:
> >> What happens if /dev/random is actually /dev/urandom?  Will the poll
> >> call fail?
> >
> > Yes. I'm unsure if you're asking this because it'd be a nice
> > simplification to only have to open one fd, or because you're worried
> > about confusion. I don't think the confusion problem is one we should
> > take too seriously, but if you're concerned, we can always fstat and
> > check the maj/min. Seems a bit much, though.
> 
> Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
> to be the only way to get some applications working because they tried
> to read from /dev/random at a higher rate than the system was estimating
> entropy coming in.  We may have to do something differently here if the
> failing poll causes too much breakage.

The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail
until it passes a certain threshold one time. This might also work on even older
kernels than the poll() trick. But that's pretty darn ugly, so it's not
obvious to me where the cut-off in frustration is, when we throw our
hands up and decide the ugliness is worth it compared to whatever
problems we happen to be facing at the time with the poll() technique.
But at least there is an alternative, should we need it.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:20             ` Jason A. Donenfeld
@ 2022-07-26 11:35               ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 11:35 UTC (permalink / raw)
  To: Jason A. Donenfeld, Florian Weimer
  Cc: libc-alpha, Cristian Rodríguez, Paul Eggert, linux-crypto



On 26/07/22 08:20, Jason A. Donenfeld wrote:
> Hey Florian,
> 
> On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote:
>>>> What happens if /dev/random is actually /dev/urandom?  Will the poll
>>>> call fail?
>>>
>>> Yes. I'm unsure if you're asking this because it'd be a nice
>>> simplification to only have to open one fd, or because you're worried
>>> about confusion. I don't think the confusion problem is one we should
>>> take too seriously, but if you're concerned, we can always fstat and
>>> check the maj/min. Seems a bit much, though.
>>
>> Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used
>> to be the only way to get some applications working because they tried
>> to read from /dev/random at a higher rate than the system was estimating
>> entropy coming in.  We may have to do something differently here if the
>> failing poll causes too much breakage.
> 
> The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail
> until it passes a certain threshold one time. This might also work on even older
> kernels than the poll() trick. But that's pretty darn ugly, so it's not
> obvious to me where the cut-off in frustration is, when we throw our
> hands up and decide the ugliness is worth it compared to whatever
> problems we happen to be facing at the time with the poll() technique.
> But at least there is an alternative, should we need it.

I think the poll trick is way better, although I also think it is very Linux
specific.  Should we move it to Linux sysdeps?

The /proc/sys/kernel/random/entropy_avail would require to open another file
descriptor, which I think we avoid for arc4random if possible.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
                         ` (2 preceding siblings ...)
  2022-07-26  9:55       ` Florian Weimer
@ 2022-07-26 11:33       ` Adhemerval Zanella Netto
  2022-07-26 11:54         ` Jason A. Donenfeld
  3 siblings, 1 reply; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 11:33 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha
  Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto



On 25/07/22 20:28, Jason A. Donenfeld wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
> 
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
> 
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
> 
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
> 
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, but it's the best approximation
> we can do.
> 
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.
> 
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
to update the manual. Some comments below.


> ---
>  LICENSES                                      |  23 -
>  include/stdlib.h                              |   3 -
>  stdlib/Makefile                               |   2 -
>  stdlib/arc4random.c                           | 204 ++-----
>  stdlib/arc4random.h                           |  48 --
>  stdlib/chacha20.c                             | 191 ------
>  stdlib/tst-arc4random-chacha20.c              | 167 -----
>  sysdeps/aarch64/Makefile                      |   4 -
>  sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
>  sysdeps/aarch64/chacha20_arch.h               |  40 --
>  sysdeps/generic/chacha20_arch.h               |  24 -
>  sysdeps/generic/tls-internal.c                |  10 -
>  sysdeps/mach/hurd/_Fork.c                     |   2 -
>  sysdeps/nptl/_Fork.c                          |   2 -
>  .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
>  .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
>  .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
>  sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
>  .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
>  .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
>  sysdeps/s390/s390-64/Makefile                 |   6 -
>  sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
>  sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
>  sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
>  sysdeps/x86_64/Makefile                       |   7 -
>  sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
>  sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
>  sysdeps/x86_64/chacha20_arch.h                |  55 --
>  28 files changed, 53 insertions(+), 2661 deletions(-)
>  delete mode 100644 stdlib/arc4random.h
>  delete mode 100644 stdlib/chacha20.c
>  delete mode 100644 stdlib/tst-arc4random-chacha20.c
>  delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
>  delete mode 100644 sysdeps/aarch64/chacha20_arch.h
>  delete mode 100644 sysdeps/generic/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
>  delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
>  delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
>  delete mode 100644 sysdeps/x86_64/chacha20_arch.h
> 
> diff --git a/LICENSES b/LICENSES
> index cd04fb6e84..530893b1dc 100644
> --- a/LICENSES
> +++ b/LICENSES
> @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
>   You should have received a copy of the GNU Lesser General Public
>   License along with this library; if not, see
>   <https://www.gnu.org/licenses/>.  */
> -\f
> -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
> -sysdeps/x86_64/chacha20-amd64-avx2.S, and
> -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
> -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
> -with the following notices:
> -
> -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -This file is part of Libgcrypt.
> -
> -Libgcrypt is free software; you can redistribute it and/or modify
> -it under the terms of the GNU Lesser General Public License as
> -published by the Free Software Foundation; either version 2.1 of
> -the License, or (at your option) any later version.
> -
> -Libgcrypt is distributed in the hope that it will be useful,
> -but WITHOUT ANY WARRANTY; without even the implied warranty of
> -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -GNU Lesser General Public License for more details.
> -
> -You should have received a copy of the GNU Lesser General Public
> -License along with this program; if not, see <https://www.gnu.org/licenses/>.
> diff --git a/include/stdlib.h b/include/stdlib.h
> index cae7f7cdf8..db51f4a4f6 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
>  libc_hidden_proto (__arc4random_uniform);
>  extern void __arc4random_buf_internal (void *buffer, size_t len)
>       attribute_hidden;
> -/* Called from the fork function to reinitialize the internal cipher state
> -   in child process.  */
> -extern void __arc4random_fork_subprocess (void) attribute_hidden;
>  
>  extern double __strtod_internal (const char *__restrict __nptr,
>  				 char **__restrict __endptr, int __group)
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index a900962685..f7b25c1981 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -246,7 +246,6 @@ tests := \
>    # tests
>  
>  tests-internal := \
> -  tst-arc4random-chacha20 \
>    tst-strtod1i \
>    tst-strtod3 \
>    tst-strtod4 \
> @@ -256,7 +255,6 @@ tests-internal := \
>    # tests-internal
>  
>  tests-static := \
> -  tst-arc4random-chacha20 \
>    tst-secure-getenv \
>    # tests-static
>  
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 65547e79aa..80c55cde63 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -1,4 +1,4 @@
> -/* Pseudo Random Number Generator based on ChaCha20.
> +/* Pseudo Random Number Generator
>     Copyright (C) 2022 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
> @@ -16,61 +16,14 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <arc4random.h>
>  #include <errno.h>
>  #include <not-cancel.h>
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <sys/poll.h>
>  #include <sys/mman.h>
>  #include <sys/param.h>
>  #include <sys/random.h>
> -#include <tls-internal.h>
> -
> -/* arc4random keeps two counters: 'have' is the current valid bytes not yet
> -   consumed in 'buf' while 'count' is the maximum number of bytes until a
> -   reseed.
> -
> -   Both the initial seed and reseed try to obtain entropy from the kernel
> -   and abort the process if none could be obtained.
> -
> -   The state 'buf' improves the usage of the cipher calls, allowing to call
> -   optimized implementations (if the architecture provides it) and minimize
> -   function call overhead.  */
> -
> -#include <chacha20.c>
> -
> -/* Called from the fork function to reset the state.  */
> -void
> -__arc4random_fork_subprocess (void)
> -{
> -  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> -  if (state != NULL)
> -    {
> -      explicit_bzero (state, sizeof (*state));
> -      /* Force key init.  */
> -      state->count = -1;
> -    }
> -}
> -
> -/* Return the current thread random state or try to create one if there is
> -   none available.  In the case malloc can not allocate a state, arc4random
> -   will try to get entropy with arc4random_getentropy.  */
> -static struct arc4random_state_t *
> -arc4random_get_state (void)
> -{
> -  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> -  if (state == NULL)
> -    {
> -      state = malloc (sizeof (struct arc4random_state_t));
> -      if (state != NULL)
> -	{
> -	  /* Force key initialization on first call.  */
> -	  state->count = -1;
> -	  __glibc_tls_internal ()->rand_state = state;
> -	}
> -    }
> -  return state;
> -}
>  
>  static void
>  arc4random_getrandom_failure (void)
> @@ -78,106 +31,70 @@ arc4random_getrandom_failure (void)
>    __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
>  }
>  
> -static void
> -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
> +void
> +__arc4random_buf (void *p, size_t n)
>  {
> -  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> +  static bool have_getrandom = true, seen_initialized = false;
> +  int fd;

I think it should reasonable to assume that getrandom syscall will be always
supported and using arc4random in an enviroment with filtered getrandom does
not make much sense.  We are trying to avoid add this static syscall checks
where possible, also plain load/store to se the static have_getrandom
is strickly a race-condition, although it should not really matter (we use
relaxed load/store in such optimization (check
sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).

Also, does it make sense to fallback if we build for a kernel that should
always support getrandom?

>  
> -  /* Mix optional user provided data.  */
> -  if (rnd != NULL)
> -    {
> -      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -      for (size_t i = 0; i < m; i++)
> -	state->buf[i] ^= rnd[i];
> -    }
> -
> -  /* Immediately reinit for backtracking resistance.  */
> -  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> -  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -}
> -
> -static void
> -arc4random_getentropy (void *rnd, size_t len)
> -{
> -  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +  if (n == 0)
>      return;
>  
> -  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
> -						  O_RDONLY | O_CLOEXEC));
> -  if (fd != -1)
> +  for (;;)
>      {
> -      uint8_t *p = rnd;
> -      uint8_t *end = p + len;
> -      do
> -	{
> -	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> -	  if (ret <= 0)
> -	    arc4random_getrandom_failure ();
> -	  p += ret;
> -	}
> -      while (p < end);
> +      ssize_t l;
>  
> -      if (__close_nocancel (fd) == 0)
> -	return;
> -    }
> -  arc4random_getrandom_failure ();
> -}
> +      if (!have_getrandom)
> +	break;
>  
> -/* Check if the thread context STATE should be reseed with kernel entropy
> -   depending of requested LEN bytes.  If there is less than requested,
> -   the state is either initialized or reseeded, otherwise the internal
> -   counter subtract the requested length.  */
> -static void
> -arc4random_check_stir (struct arc4random_state_t *state, size_t len)
> -{
> -  if (state->count <= len || state->count == -1)
> -    {
> -      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> -      arc4random_getentropy (rnd, sizeof rnd);
> -
> -      if (state->count == -1)
> -	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> -      else
> -	arc4random_rekey (state, rnd, sizeof rnd);
> -
> -      explicit_bzero (rnd, sizeof rnd);
> -
> -      /* Invalidate the buf.  */
> -      state->have = 0;
> -      memset (state->buf, 0, sizeof state->buf);
> -      state->count = CHACHA20_RESEED_SIZE;
> +      l = __getrandom_nocancel (p, n, 0);

Do we need to worry about a potentially uncancellable blocking call here? I guess
using GRND_NONBLOCK does not really help.

> +      if (l > 0)
> +	{
> +	  if ((size_t) l == n)

Do we need the cast here?

> +	    return; /* Done reading, success. */

Minor style issue: use double space before period.

> +	  p = (uint8_t *) p + l;
> +	  n -= l;
> +	  continue; /* Interrupted by a signal; keep going. */
> +	}
> +      else if (l == 0)
> +	arc4random_getrandom_failure (); /* Weird, should never happen. */
> +      else if (errno == ENOSYS)
> +	{
> +	  have_getrandom = false;
> +	  break; /* No syscall, so fallback to /dev/urandom. */
> +	}
> +      arc4random_getrandom_failure (); /* Unknown error, should never happen. */
>      }
> -  else
> -    state->count -= len;
> -}
>  
> -void
> -__arc4random_buf (void *buffer, size_t len)
> -{
> -  struct arc4random_state_t *state = arc4random_get_state ();
> -  if (__glibc_unlikely (state == NULL))
> +  if (!seen_initialized)
>      {
> -      arc4random_getentropy (buffer, len);
> -      return;
> +      struct pollfd pfd = { .events = POLLIN };> +      pfd.fd = TEMP_FAILURE_RETRY (
> +	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +      if (pfd.fd < 0)
> +	arc4random_getrandom_failure ();
> +      if (__poll (&pfd, 1, -1) < 0)
> +	arc4random_getrandom_failure ();

As Florian said we will need a non cancellable poll here.  Since you are setting
the timeout as undefined, I think it would be simple to just add a non cancellable
wrapper as:

  int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
  {
  #ifndef __NR_ppoll_time64
  # define __NR_ppoll_time64 __NR_ppoll
  #endif
     return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
  }

So we don't need to handle the timeout for 64-bit time_t wrappers.

> +      if (__close_nocancel (pfd.fd) < 0)
> +	arc4random_getrandom_failure ();
> +      seen_initialized = true;

I think we will need to use relaxed atomics, and maybe se the type to int (not sure
if atomic wrappers correctly on bool types on all architectures). 

>      }
>  
> -  arc4random_check_stir (state, len);
> -  while (len > 0)
> +  fd = TEMP_FAILURE_RETRY (
> +      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +  if (fd < 0)
> +    arc4random_getrandom_failure ();
> +  do
>      {
> -      if (state->have > 0)
> -	{
> -	  size_t m = MIN (len, state->have);
> -	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> -	  memcpy (buffer, ks, m);
> -	  explicit_bzero (ks, m);
> -	  buffer += m;
> -	  len -= m;
> -	  state->have -= m;
> -	}
> -      if (state->have == 0)
> -	arc4random_rekey (state, NULL, 0);
> +      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
> +      if (l <= 0)
> +	arc4random_getrandom_failure ();
> +      p = (uint8_t *) p + l;
> +      n -= l;
>      }
> +  while (n);
> +  if (__close_nocancel (fd) < 0)
> +    arc4random_getrandom_failure ();
>  }
>  libc_hidden_def (__arc4random_buf)
>  weak_alias (__arc4random_buf, arc4random_buf)
> @@ -186,22 +103,7 @@ uint32_t
>  __arc4random (void)
>  {
>    uint32_t r;
> -
> -  struct arc4random_state_t *state = arc4random_get_state ();
> -  if (__glibc_unlikely (state == NULL))
> -    {
> -      arc4random_getentropy (&r, sizeof (uint32_t));
> -      return r;
> -    }
> -
> -  arc4random_check_stir (state, sizeof (uint32_t));
> -  if (state->have < sizeof (uint32_t))
> -    arc4random_rekey (state, NULL, 0);
> -  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> -  memcpy (&r, ks, sizeof (uint32_t));
> -  memset (ks, 0, sizeof (uint32_t));
> -  state->have -= sizeof (uint32_t);
> -
> +  __arc4random_buf (&r, sizeof (r));
>    return r;
>  }
>  libc_hidden_def (__arc4random)
> diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
> deleted file mode 100644
> index cd39389c19..0000000000
> --- a/stdlib/arc4random.h
> +++ /dev/null
> @@ -1,48 +0,0 @@
> -/* Arc4random definition used on TLS.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#ifndef _CHACHA20_H
> -#define _CHACHA20_H
> -
> -#include <stddef.h>
> -#include <stdint.h>
> -
> -/* Internal ChaCha20 state.  */
> -#define CHACHA20_STATE_LEN	16
> -#define CHACHA20_BLOCK_SIZE	64
> -
> -/* Maximum number bytes until reseed (16 MB).  */
> -#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
> -
> -/* Internal arc4random buffer, used on each feedback step so offer some
> -   backtracking protection and to allow better used of vectorized
> -   chacha20 implementations.  */
> -#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> -
> -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
> -		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
> -
> -struct arc4random_state_t
> -{
> -  uint32_t ctx[CHACHA20_STATE_LEN];
> -  size_t have;
> -  size_t count;
> -  uint8_t buf[CHACHA20_BUFSIZE];
> -};
> -
> -#endif
> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> deleted file mode 100644
> index 2745a81315..0000000000
> --- a/stdlib/chacha20.c
> +++ /dev/null
> @@ -1,191 +0,0 @@
> -/* Generic ChaCha20 implementation (used on arc4random).
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <array_length.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <string.h>
> -
> -/* 32-bit stream position, then 96-bit nonce.  */
> -#define CHACHA20_IV_SIZE	16
> -#define CHACHA20_KEY_SIZE	32
> -
> -#define CHACHA20_STATE_LEN	16
> -
> -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
> -   XOR of the keystream with the plaintext because the plaintext is a
> -   stream of zeros.  */
> -
> -enum chacha20_constants
> -{
> -  CHACHA20_CONSTANT_EXPA = 0x61707865U,
> -  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
> -  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
> -  CHACHA20_CONSTANT_TE_K = 0x6b206574U
> -};
> -
> -static inline uint32_t
> -read_unaligned_32 (const uint8_t *p)
> -{
> -  uint32_t r;
> -  memcpy (&r, p, sizeof (r));
> -  return r;
> -}
> -
> -static inline void
> -write_unaligned_32 (uint8_t *p, uint32_t v)
> -{
> -  memcpy (p, &v, sizeof (v));
> -}
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
> -# define set_state(v)		__builtin_bswap32 ((v))
> -#else
> -# define read_unaligned_le32(p) read_unaligned_32 ((p))
> -# define set_state(v)		(v)
> -#endif
> -
> -static inline void
> -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
> -{
> -  state[0]  = CHACHA20_CONSTANT_EXPA;
> -  state[1]  = CHACHA20_CONSTANT_ND_3;
> -  state[2]  = CHACHA20_CONSTANT_2_BY;
> -  state[3]  = CHACHA20_CONSTANT_TE_K;
> -
> -  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
> -  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
> -  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
> -  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
> -  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
> -  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
> -  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
> -  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
> -
> -  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
> -  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
> -  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
> -  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
> -}
> -
> -static inline uint32_t
> -rotl32 (unsigned int shift, uint32_t word)
> -{
> -  return (word << (shift & 31)) | (word >> ((-shift) & 31));
> -}
> -
> -static void
> -state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
> -{
> -#ifdef CHACHA20_XOR_FINAL
> -  v ^= read_unaligned_32 (src);
> -#endif
> -  write_unaligned_32 (dst, v);
> -}
> -
> -static inline void
> -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
> -{
> -  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
> -  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
> -
> -  x0 = state[0];
> -  x1 = state[1];
> -  x2 = state[2];
> -  x3 = state[3];
> -  x4 = state[4];
> -  x5 = state[5];
> -  x6 = state[6];
> -  x7 = state[7];
> -  x8 = state[8];
> -  x9 = state[9];
> -  x10 = state[10];
> -  x11 = state[11];
> -  x12 = state[12];
> -  x13 = state[13];
> -  x14 = state[14];
> -  x15 = state[15];
> -
> -  for (int i = 0; i < 20; i += 2)
> -    {
> -#define QROUND(_x0, _x1, _x2, _x3) 			\
> -  do {							\
> -   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
> -   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
> -   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
> -   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
> -  } while(0)
> -
> -      QROUND (x0, x4, x8,  x12);
> -      QROUND (x1, x5, x9,  x13);
> -      QROUND (x2, x6, x10, x14);
> -      QROUND (x3, x7, x11, x15);
> -
> -      QROUND (x0, x5, x10, x15);
> -      QROUND (x1, x6, x11, x12);
> -      QROUND (x2, x7, x8,  x13);
> -      QROUND (x3, x4, x9,  x14);
> -    }
> -
> -  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
> -  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
> -  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
> -  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
> -  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
> -  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
> -  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
> -  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
> -  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
> -  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
> -  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
> -  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
> -  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
> -  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
> -  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
> -  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
> -
> -  state[12]++;
> -}
> -
> -static void
> -__attribute_maybe_unused__
> -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -			size_t bytes)
> -{
> -  while (bytes >= CHACHA20_BLOCK_SIZE)
> -    {
> -      chacha20_block (state, dst, src);
> -
> -      bytes -= CHACHA20_BLOCK_SIZE;
> -      dst += CHACHA20_BLOCK_SIZE;
> -      src += CHACHA20_BLOCK_SIZE;
> -    }
> -
> -  if (__glibc_unlikely (bytes != 0))
> -    {
> -      uint8_t stream[CHACHA20_BLOCK_SIZE];
> -      chacha20_block (state, stream, src);
> -      memcpy (dst, stream, bytes);
> -      explicit_bzero (stream, sizeof stream);
> -    }
> -}
> -
> -/* Get the architecture optimized version.  */
> -#include <chacha20_arch.h>
> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
> deleted file mode 100644
> index 45ba54920d..0000000000
> --- a/stdlib/tst-arc4random-chacha20.c
> +++ /dev/null
> @@ -1,167 +0,0 @@
> -/* Basic tests for chacha20 cypher used in arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <arc4random.h>
> -#include <support/check.h>
> -#include <sys/cdefs.h>
> -
> -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
> -   actual does.  */
> -#include <chacha20.c>
> -
> -static int
> -do_test (void)
> -{
> -  const uint8_t key[CHACHA20_KEY_SIZE] =
> -    {
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -    };
> -  const uint8_t iv[CHACHA20_IV_SIZE] =
> -    {
> -      0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -    };
> -  const uint8_t expected1[CHACHA20_BUFSIZE] =
> -    {
> -      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
> -      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
> -      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
> -      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
> -      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
> -      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
> -      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
> -      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
> -      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
> -      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
> -      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
> -      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
> -      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
> -      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
> -      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
> -      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
> -      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
> -      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
> -      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
> -      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
> -      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
> -      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
> -      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
> -      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
> -      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
> -      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
> -      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
> -      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
> -      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
> -      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
> -      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
> -      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
> -      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
> -      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
> -      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
> -      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
> -      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
> -      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
> -      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
> -      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
> -      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
> -      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
> -      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
> -      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
> -      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
> -      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
> -      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
> -    };
> -
> -  const uint8_t expected2[CHACHA20_BUFSIZE] =
> -    {
> -      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
> -      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
> -      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
> -      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
> -      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
> -      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
> -      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
> -      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
> -      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
> -      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
> -      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
> -      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
> -      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
> -      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
> -      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
> -      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
> -      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
> -      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
> -      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
> -      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
> -      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
> -      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
> -      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
> -      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
> -      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
> -      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
> -      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
> -      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
> -      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
> -      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
> -      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
> -      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
> -      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
> -      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
> -      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
> -      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
> -      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
> -      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
> -      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
> -      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
> -      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
> -      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
> -      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
> -      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
> -      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
> -      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
> -      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
> -    };
> -
> -  /* Check with the expected internal arc4random keystream buffer.  Some
> -     architecture optimizations expects a buffer with a minimum size which
> -     is a multiple of then ChaCha20 blocksize, so they might not be prepared
> -     to handle smaller buffers.  */
> -
> -  uint8_t output[CHACHA20_BUFSIZE];
> -
> -  uint32_t state[CHACHA20_STATE_LEN];
> -  chacha20_init (state, key, iv);
> -
> -  /* Check with the initial state.  */
> -  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
> -
> -  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> -  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
> -
> -  /* And on the next round.  */
> -  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> -  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
> -
> -  return 0;
> -}
> -
> -#include <support/test-driver.c>
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 7dfd1b62dd..17fb1c5b72 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
>  gen-as-const-headers += tlsdesc.sym
>  endif
>  
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-aarch64
> -endif
> -
>  ifeq ($(subdir),gmon)
>  CFLAGS-mcount.c += -mgeneral-regs-only
>  endif
> diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
> deleted file mode 100644
> index cce5291c5c..0000000000
> --- a/sysdeps/aarch64/chacha20-aarch64.S
> +++ /dev/null
> @@ -1,314 +0,0 @@
> -/* Optimized AArch64 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -
> -/* Only LE is supported.  */
> -#ifdef __AARCH64EL__
> -
> -#define GET_DATA_POINTER(reg, name) \
> -        adrp    reg, name ; \
> -        add     reg, reg, :lo12:name
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> -        ret; dsb sy; isb;
> -
> -.cpu generic+simd
> -
> -.text
> -
> -/* register macros */
> -#define INPUT     x0
> -#define DST       x1
> -#define SRC       x2
> -#define NBLKS     x3
> -#define ROUND     x4
> -#define INPUT_CTR x5
> -#define INPUT_POS x6
> -#define CTR       x7
> -
> -/* vector registers */
> -#define X0 v16
> -#define X4 v17
> -#define X8 v18
> -#define X12 v19
> -
> -#define X1 v20
> -#define X5 v21
> -
> -#define X9 v22
> -#define X13 v23
> -#define X2 v24
> -#define X6 v25
> -
> -#define X3 v26
> -#define X7 v27
> -#define X11 v28
> -#define X15 v29
> -
> -#define X10 v30
> -#define X14 v31
> -
> -#define VCTR    v0
> -#define VTMP0   v1
> -#define VTMP1   v2
> -#define VTMP2   v3
> -#define VTMP3   v4
> -#define X12_TMP v5
> -#define X13_TMP v6
> -#define ROT8    v7
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -#define _(...) __VA_ARGS__
> -
> -#define vpunpckldq(s1, s2, dst) \
> -	zip1 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpckhdq(s1, s2, dst) \
> -	zip2 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpcklqdq(s1, s2, dst) \
> -	zip1 dst.2d, s2.2d, s1.2d;
> -
> -#define vpunpckhqdq(s1, s2, dst) \
> -	zip2 dst.2d, s2.2d, s1.2d;
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
> -	vpunpckhdq(x1, x0, t2); \
> -	vpunpckldq(x1, x0, x0); \
> -	\
> -	vpunpckldq(x3, x2, t1); \
> -	vpunpckhdq(x3, x2, x2); \
> -	\
> -	vpunpckhqdq(t1, x0, x1); \
> -	vpunpcklqdq(t1, x0, x0); \
> -	\
> -	vpunpckhqdq(x2, t2, x3); \
> -	vpunpcklqdq(x2, t2, x2);
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define XOR(d,s1,s2) \
> -	eor d.16b, s2.16b, s1.16b;
> -
> -#define PLUS(ds,s) \
> -	add ds.4s, ds.4s, s.4s;
> -
> -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
> -	shl dst1.4s, src1.4s, #(c);		\
> -	shl dst2.4s, src2.4s, #(c);		\
> -	shl dst3.4s, src3.4s, #(c);		\
> -	shl dst4.4s, src4.4s, #(c);		\
> -	sri dst1.4s, src1.4s, #(32 - (c));	\
> -	sri dst2.4s, src2.4s, #(32 - (c));	\
> -	sri dst3.4s, src3.4s, #(32 - (c));	\
> -	sri dst4.4s, src4.4s, #(32 - (c));
> -
> -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> -	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
> -	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
> -	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
> -	tbl dst4.16b, {src4.16b}, ROT8.16b;
> -
> -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> -	rev32 dst1.8h, src1.8h;			\
> -	rev32 dst2.8h, src2.8h;			\
> -	rev32 dst3.8h, src3.8h;			\
> -	rev32 dst4.8h, src4.8h;
> -
> -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
> -	PLUS(a1,b1); PLUS(a2,b2);						\
> -	PLUS(a3,b3); PLUS(a4,b4);						\
> -	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
> -	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
> -		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
> -	PLUS(c1,d1); PLUS(c2,d2);						\
> -	PLUS(c3,d3); PLUS(c4,d4);						\
> -	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
> -	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
> -		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
> -	PLUS(a1,b1); PLUS(a2,b2);						\
> -	PLUS(a3,b3); PLUS(a4,b4);						\
> -	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
> -	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
> -		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
> -	PLUS(c1,d1); PLUS(c2,d2);						\
> -	PLUS(c3,d3); PLUS(c4,d4);						\
> -	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
> -	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
> -		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
> -
> -.align 4
> -L(__chacha20_blocks4_data_inc_counter):
> -	.long 0,1,2,3
> -
> -.align 4
> -L(__chacha20_blocks4_data_rot8):
> -	.byte 3,0,1,2
> -	.byte 7,4,5,6
> -	.byte 11,8,9,10
> -	.byte 15,12,13,14
> -
> -.hidden __chacha20_neon_blocks4
> -ENTRY (__chacha20_neon_blocks4)
> -	/* input:
> -	 *	x0: input
> -	 *	x1: dst
> -	 *	x2: src
> -	 *	x3: nblks (multiple of 4)
> -	 */
> -
> -	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
> -	add INPUT_CTR, INPUT, #(12*4);
> -	ld1 {ROT8.16b}, [CTR];
> -	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
> -	mov INPUT_POS, INPUT;
> -	ld1 {VCTR.16b}, [CTR];
> -
> -L(loop4):
> -	/* Construct counter vectors X12 and X13 */
> -
> -	ld1 {X15.16b}, [INPUT_CTR];
> -	mov ROUND, #20;
> -	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
> -
> -	dup X12.4s, X15.s[0];
> -	dup X13.4s, X15.s[1];
> -	ldr CTR, [INPUT_CTR];
> -	add X12.4s, X12.4s, VCTR.4s;
> -	dup X0.4s, VTMP1.s[0];
> -	dup X1.4s, VTMP1.s[1];
> -	dup X2.4s, VTMP1.s[2];
> -	dup X3.4s, VTMP1.s[3];
> -	dup X14.4s, X15.s[2];
> -	cmhi VTMP0.4s, VCTR.4s, X12.4s;
> -	dup X15.4s, X15.s[3];
> -	add CTR, CTR, #4; /* Update counter */
> -	dup X4.4s, VTMP2.s[0];
> -	dup X5.4s, VTMP2.s[1];
> -	dup X6.4s, VTMP2.s[2];
> -	dup X7.4s, VTMP2.s[3];
> -	sub X13.4s, X13.4s, VTMP0.4s;
> -	dup X8.4s, VTMP3.s[0];
> -	dup X9.4s, VTMP3.s[1];
> -	dup X10.4s, VTMP3.s[2];
> -	dup X11.4s, VTMP3.s[3];
> -	mov X12_TMP.16b, X12.16b;
> -	mov X13_TMP.16b, X13.16b;
> -	str CTR, [INPUT_CTR];
> -
> -L(round2):
> -	subs ROUND, ROUND, #2
> -	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
> -		      X2, X6, X10, X14,   X3, X7, X11, X15,
> -		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> -	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
> -		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
> -		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> -	b.ne L(round2);
> -
> -	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
> -
> -	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
> -	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
> -
> -	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
> -	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
> -	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
> -	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
> -	PLUS(X0, VTMP2);
> -	PLUS(X1, VTMP3);
> -	PLUS(X2, X12_TMP);
> -	PLUS(X3, X13_TMP);
> -
> -	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
> -	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
> -	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
> -	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
> -	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
> -	mov INPUT_POS, INPUT;
> -	PLUS(X4, VTMP2);
> -	PLUS(X5, VTMP3);
> -	PLUS(X6, X12_TMP);
> -	PLUS(X7, X13_TMP);
> -
> -	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
> -	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
> -	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
> -	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
> -	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
> -	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
> -	PLUS(X8, VTMP2);
> -	PLUS(X9, VTMP3);
> -	PLUS(X10, X12_TMP);
> -	PLUS(X11, X13_TMP);
> -	PLUS(X14, VTMP0);
> -	PLUS(X15, VTMP1);
> -
> -	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
> -
> -	subs NBLKS, NBLKS, #4;
> -
> -	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
> -	st1 {X1.16b,X5.16b}, [DST], #32;
> -	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
> -	st1 {X10.16b,X14.16b}, [DST], #32;
> -	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
> -
> -	b.ne L(loop4);
> -
> -	ret_spec_stop
> -END (__chacha20_neon_blocks4)
> -
> -#endif
> diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
> deleted file mode 100644
> index 37dbb917f1..0000000000
> --- a/sysdeps/aarch64/chacha20_arch.h
> +++ /dev/null
> @@ -1,40 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <ldsodefs.h>
> -#include <stdbool.h>
> -
> -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
> -#ifdef __AARCH64EL__
> -  __chacha20_neon_blocks4 (state, dst, src,
> -			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -#endif
> -}
> diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
> deleted file mode 100644
> index 1b4559ccbc..0000000000
> --- a/sysdeps/generic/chacha20_arch.h
> +++ /dev/null
> @@ -1,24 +0,0 @@
> -/* Chacha20 implementation, generic interface for encrypt.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
> index 8a0f37d509..b32b31b5a9 100644
> --- a/sysdeps/generic/tls-internal.c
> +++ b/sysdeps/generic/tls-internal.c
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <stdlib/arc4random.h>
>  #include <string.h>
>  #include <tls-internal.h>
>  
> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
>  {
>    free (__tls_internal.strsignal_buf);
>    free (__tls_internal.strerror_l_buf);
> -
> -  if (__tls_internal.rand_state != NULL)
> -    {
> -      /* Clear any lingering random state prior so if the thread stack is
> -	 cached it won't leak any data.  */
> -      explicit_bzero (__tls_internal.rand_state,
> -		      sizeof (*__tls_internal.rand_state));
> -      free (__tls_internal.rand_state);
> -    }
>  }
> diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
> index 667068c8cf..e60b86fab1 100644
> --- a/sysdeps/mach/hurd/_Fork.c
> +++ b/sysdeps/mach/hurd/_Fork.c
> @@ -662,8 +662,6 @@ retry:
>        _hurd_malloc_fork_child ();
>        call_function_static_weak (__malloc_fork_unlock_child);
>  
> -      call_function_static_weak (__arc4random_fork_subprocess);
> -
>        /* Run things that want to run in the child task to set up.  */
>        RUN_HOOK (_hurd_fork_child_hook, ());
>  
> diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
> index 7dc02569f6..dd568992e2 100644
> --- a/sysdeps/nptl/_Fork.c
> +++ b/sysdeps/nptl/_Fork.c
> @@ -43,8 +43,6 @@ _Fork (void)
>        self->robust_head.list = &self->robust_head;
>        INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
>  			     sizeof (struct robust_list_head));
> -
> -      call_function_static_weak (__arc4random_fork_subprocess);
>      }
>    return pid;
>  }
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> deleted file mode 100644
> index 8c75165f7f..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> deleted file mode 100644
> index cf9e735326..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> +++ /dev/null
> @@ -1 +0,0 @@
> -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> deleted file mode 100644
> index 08494dc045..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> -					const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> -		const uint8_t *src, size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> -  unsigned long int hwcap = GLRO(dl_hwcap);
> -  unsigned long int hwcap2 = GLRO(dl_hwcap2);
> -  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> -    __chacha20_power8_blocks4 (state, dst, src,
> -			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -  else
> -    chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
> index abb0aa3f11..71a59529f3 100644
> --- a/sysdeps/powerpc/powerpc64/power8/Makefile
> +++ b/sysdeps/powerpc/powerpc64/power8/Makefile
> @@ -1,8 +1,3 @@
>  ifeq ($(subdir),string)
>  sysdep_routines += strcasestr-ppc64
>  endif
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif
> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> deleted file mode 100644
> index 0bbdcb9363..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> +++ /dev/null
> @@ -1,256 +0,0 @@
> -/* Optimized PowerPC implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
> -   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <altivec.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <sys/cdefs.h>
> -
> -typedef vector unsigned char vector16x_u8;
> -typedef vector unsigned int vector4x_u32;
> -typedef vector unsigned long long vector2x_u64;
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -static const vector16x_u8 le_bswap_const =
> -  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
> -#endif
> -
> -static inline vector4x_u32
> -vec_rol_elems (vector4x_u32 v, unsigned int idx)
> -{
> -#if __BYTE_ORDER != __BIG_ENDIAN
> -  return vec_sld (v, v, (16 - (4 * idx)) & 15);
> -#else
> -  return vec_sld (v, v, (4 * idx) & 15);
> -#endif
> -}
> -
> -static inline vector4x_u32
> -vec_load_le (unsigned long offset, const unsigned char *ptr)
> -{
> -  vector4x_u32 vec;
> -  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
> -				 le_bswap_const);
> -#endif
> -  return vec;
> -}
> -
> -static inline void
> -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
> -			       le_bswap_const);
> -#endif
> -  vec_vsx_st (vec, offset, (uint32_t *)ptr);
> -}
> -
> -
> -static inline vector4x_u32
> -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  static const vector16x_u8 swap32 =
> -    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
> -  vector2x_u64 vec, add, sum;
> -
> -  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
> -  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
> -  sum = vec + add;
> -  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
> -#else
> -  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
> -#endif
> -}
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE(v1,rolv)			\
> -	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
> -
> -#define PLUS(ds,s) \
> -	((ds) += (s))
> -
> -#define XOR(ds,s) \
> -	((ds) ^= (s))
> -
> -#define ADD_U64(v,a) \
> -	(v = vec_add_ctr_u64(v, a))
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3) ({ \
> -	vector4x_u32 t1 = vec_mergeh(x0, x2); \
> -	vector4x_u32 t2 = vec_mergel(x0, x2); \
> -	vector4x_u32 t3 = vec_mergeh(x1, x3); \
> -	x3 = vec_mergel(x1, x3); \
> -	x0 = vec_mergeh(t1, t3); \
> -	x1 = vec_mergel(t1, t3); \
> -	x2 = vec_mergeh(t2, x3); \
> -	x3 = vec_mergel(t2, x3); \
> -      })
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
> -
> -unsigned int attribute_hidden
> -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -			   size_t nblks)
> -{
> -  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
> -  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
> -  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
> -  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
> -  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
> -  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
> -  vector4x_u32 state0, state1, state2, state3;
> -  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
> -  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
> -  vector4x_u32 tmp;
> -  int i;
> -
> -  /* Force preload of constants to vector registers.  */
> -  __asm__ ("": "+v" (counters_0123) :: "memory");
> -  __asm__ ("": "+v" (counter_4) :: "memory");
> -  __asm__ ("": "+v" (rotate_16) :: "memory");
> -  __asm__ ("": "+v" (rotate_12) :: "memory");
> -  __asm__ ("": "+v" (rotate_8) :: "memory");
> -  __asm__ ("": "+v" (rotate_7) :: "memory");
> -
> -  state0 = vec_vsx_ld (0 * 16, state);
> -  state1 = vec_vsx_ld (1 * 16, state);
> -  state2 = vec_vsx_ld (2 * 16, state);
> -  state3 = vec_vsx_ld (3 * 16, state);
> -
> -  do
> -    {
> -      v0 = vec_splat (state0, 0);
> -      v1 = vec_splat (state0, 1);
> -      v2 = vec_splat (state0, 2);
> -      v3 = vec_splat (state0, 3);
> -      v4 = vec_splat (state1, 0);
> -      v5 = vec_splat (state1, 1);
> -      v6 = vec_splat (state1, 2);
> -      v7 = vec_splat (state1, 3);
> -      v8 = vec_splat (state2, 0);
> -      v9 = vec_splat (state2, 1);
> -      v10 = vec_splat (state2, 2);
> -      v11 = vec_splat (state2, 3);
> -      v12 = vec_splat (state3, 0);
> -      v13 = vec_splat (state3, 1);
> -      v14 = vec_splat (state3, 2);
> -      v15 = vec_splat (state3, 3);
> -
> -      v12 += counters_0123;
> -      v13 -= vec_cmplt (v12, counters_0123);
> -
> -      for (i = 20; i > 0; i -= 2)
> -	{
> -	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
> -	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
> -	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
> -	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
> -	}
> -
> -      v0 += vec_splat (state0, 0);
> -      v1 += vec_splat (state0, 1);
> -      v2 += vec_splat (state0, 2);
> -      v3 += vec_splat (state0, 3);
> -      v4 += vec_splat (state1, 0);
> -      v5 += vec_splat (state1, 1);
> -      v6 += vec_splat (state1, 2);
> -      v7 += vec_splat (state1, 3);
> -      v8 += vec_splat (state2, 0);
> -      v9 += vec_splat (state2, 1);
> -      v10 += vec_splat (state2, 2);
> -      v11 += vec_splat (state2, 3);
> -      tmp = vec_splat( state3, 0);
> -      tmp += counters_0123;
> -      v12 += tmp;
> -      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
> -      v14 += vec_splat (state3, 2);
> -      v15 += vec_splat (state3, 3);
> -      ADD_U64 (state3, counter_4);
> -
> -      transpose_4x4 (v0, v1, v2, v3);
> -      transpose_4x4 (v4, v5, v6, v7);
> -      transpose_4x4 (v8, v9, v10, v11);
> -      transpose_4x4 (v12, v13, v14, v15);
> -
> -      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
> -      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
> -      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
> -      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
> -
> -      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
> -      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
> -      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
> -      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
> -
> -      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
> -      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
> -      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
> -      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
> -
> -      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
> -      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
> -      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
> -      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
> -
> -      src += 4*64;
> -      dst += 4*64;
> -
> -      nblks -= 4;
> -    }
> -  while (nblks);
> -
> -  vec_vsx_st (state3, 3 * 16, state);
> -
> -  return 0;
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> deleted file mode 100644
> index ded06762b6..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> -					const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> -		const uint8_t *src, size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> -  __chacha20_power8_blocks4 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -}
> diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
> index 96c110f490..66ed844e68 100644
> --- a/sysdeps/s390/s390-64/Makefile
> +++ b/sysdeps/s390/s390-64/Makefile
> @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
>  endif
>  
>  endif # $(subdir) == elf
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> -  chacha20-s390x \
> -  # sysdep_routines
> -endif
> diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
> deleted file mode 100644
> index e38504d370..0000000000
> --- a/sysdeps/s390/s390-64/chacha20-s390x.S
> +++ /dev/null
> @@ -1,573 +0,0 @@
> -/* Optimized s390x implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <sysdep.h>
> -
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> -
> -/* CFA expressions are used for pointing CFA and registers to
> - * SP relative offsets. */
> -# define DW_REGNO_SP 15
> -
> -/* Fixed length encoding used for integers for now. */
> -# define DW_SLEB128_7BIT(value) \
> -        0x00|((value) & 0x7f)
> -# define DW_SLEB128_28BIT(value) \
> -        0x80|((value)&0x7f), \
> -        0x80|(((value)>>7)&0x7f), \
> -        0x80|(((value)>>14)&0x7f), \
> -        0x00|(((value)>>21)&0x7f)
> -
> -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
> -        .cfi_escape \
> -          0x0f, /* DW_CFA_def_cfa_expression */ \
> -            DW_SLEB128_7BIT(11), /* length */ \
> -          0x7f, /* DW_OP_breg15, rsp + constant */ \
> -            DW_SLEB128_28BIT(rsp_offs), \
> -          0x06, /* DW_OP_deref */ \
> -          0x23, /* DW_OP_plus_constu */ \
> -            DW_SLEB128_28BIT((cfa_depth)+160)
> -
> -.machine "z13+vx"
> -.text
> -
> -.balign 16
> -.Lconsts:
> -.Lwordswap:
> -	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
> -.Lbswap128:
> -	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
> -.Lbswap32:
> -	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
> -.Lone:
> -	.long 0, 0, 0, 1
> -.Ladd_counter_0123:
> -	.long 0, 1, 2, 3
> -.Ladd_counter_4567:
> -	.long 4, 5, 6, 7
> -
> -/* register macros */
> -#define INPUT %r2
> -#define DST   %r3
> -#define SRC   %r4
> -#define NBLKS %r0
> -#define ROUND %r1
> -
> -/* stack structure */
> -
> -#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
> -#define STACK_FRAME_F8_F15 (8 * 8)
> -#define STACK_FRAME_Y0_Y15 (16 * 16)
> -#define STACK_FRAME_CTR    (4 * 16)
> -#define STACK_FRAME_PARAMS (6 * 8)
> -
> -#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
> -		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
> -		     STACK_FRAME_PARAMS)
> -
> -#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
> -#define STACK_F9     (STACK_F8 + 8)
> -#define STACK_F10    (STACK_F9 + 8)
> -#define STACK_F11    (STACK_F10 + 8)
> -#define STACK_F12    (STACK_F11 + 8)
> -#define STACK_F13    (STACK_F12 + 8)
> -#define STACK_F14    (STACK_F13 + 8)
> -#define STACK_F15    (STACK_F14 + 8)
> -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
> -#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
> -#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
> -#define STACK_DST    (STACK_INPUT + 8)
> -#define STACK_SRC    (STACK_DST + 8)
> -#define STACK_NBLKS  (STACK_SRC + 8)
> -#define STACK_POCTX  (STACK_NBLKS + 8)
> -#define STACK_POSRC  (STACK_POCTX + 8)
> -
> -#define STACK_G0_H3  STACK_Y0_Y15
> -
> -/* vector registers */
> -#define A0 %v0
> -#define A1 %v1
> -#define A2 %v2
> -#define A3 %v3
> -
> -#define B0 %v4
> -#define B1 %v5
> -#define B2 %v6
> -#define B3 %v7
> -
> -#define C0 %v8
> -#define C1 %v9
> -#define C2 %v10
> -#define C3 %v11
> -
> -#define D0 %v12
> -#define D1 %v13
> -#define D2 %v14
> -#define D3 %v15
> -
> -#define E0 %v16
> -#define E1 %v17
> -#define E2 %v18
> -#define E3 %v19
> -
> -#define F0 %v20
> -#define F1 %v21
> -#define F2 %v22
> -#define F3 %v23
> -
> -#define G0 %v24
> -#define G1 %v25
> -#define G2 %v26
> -#define G3 %v27
> -
> -#define H0 %v28
> -#define H1 %v29
> -#define H2 %v30
> -#define H3 %v31
> -
> -#define IO0 E0
> -#define IO1 E1
> -#define IO2 E2
> -#define IO3 E3
> -#define IO4 F0
> -#define IO5 F1
> -#define IO6 F2
> -#define IO7 F3
> -
> -#define S0 G0
> -#define S1 G1
> -#define S2 G2
> -#define S3 G3
> -
> -#define TMP0 H0
> -#define TMP1 H1
> -#define TMP2 H2
> -#define TMP3 H3
> -
> -#define X0 A0
> -#define X1 A1
> -#define X2 A2
> -#define X3 A3
> -#define X4 B0
> -#define X5 B1
> -#define X6 B2
> -#define X7 B3
> -#define X8 C0
> -#define X9 C1
> -#define X10 C2
> -#define X11 C3
> -#define X12 D0
> -#define X13 D1
> -#define X14 D2
> -#define X15 D3
> -
> -#define Y0 E0
> -#define Y1 E1
> -#define Y2 E2
> -#define Y3 E3
> -#define Y4 F0
> -#define Y5 F1
> -#define Y6 F2
> -#define Y7 F3
> -#define Y8 G0
> -#define Y9 G1
> -#define Y10 G2
> -#define Y11 G3
> -#define Y12 H0
> -#define Y13 H1
> -#define Y14 H2
> -#define Y15 H3
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -#define _ /*_*/
> -
> -#define START_STACK(last_r) \
> -	lgr %r0, %r15; \
> -	lghi %r1, ~15; \
> -	stmg %r6, last_r, 6 * 8(%r15); \
> -	aghi %r0, -STACK_MAX; \
> -	ngr %r0, %r1; \
> -	lgr %r1, %r15; \
> -	cfi_def_cfa_register(1); \
> -	lgr %r15, %r0; \
> -	stg %r1, 0(%r15); \
> -	cfi_cfa_on_stack(0, 0); \
> -	std %f8, STACK_F8(%r15); \
> -	std %f9, STACK_F9(%r15); \
> -	std %f10, STACK_F10(%r15); \
> -	std %f11, STACK_F11(%r15); \
> -	std %f12, STACK_F12(%r15); \
> -	std %f13, STACK_F13(%r15); \
> -	std %f14, STACK_F14(%r15); \
> -	std %f15, STACK_F15(%r15);
> -
> -#define END_STACK(last_r) \
> -	lg %r1, 0(%r15); \
> -	ld %f8, STACK_F8(%r15); \
> -	ld %f9, STACK_F9(%r15); \
> -	ld %f10, STACK_F10(%r15); \
> -	ld %f11, STACK_F11(%r15); \
> -	ld %f12, STACK_F12(%r15); \
> -	ld %f13, STACK_F13(%r15); \
> -	ld %f14, STACK_F14(%r15); \
> -	ld %f15, STACK_F15(%r15); \
> -	lmg %r6, last_r, 6 * 8(%r1); \
> -	lgr %r15, %r1; \
> -	cfi_def_cfa_register(DW_REGNO_SP);
> -
> -#define PLUS(dst,src) \
> -	vaf dst, dst, src;
> -
> -#define XOR(dst,src) \
> -	vx dst, dst, src;
> -
> -#define ROTATE(v1,c) \
> -	verllf v1, v1, (c)(0);
> -
> -#define WORD_ROTATE(v1,s) \
> -	vsldb v1, v1, v1, ((s) * 4);
> -
> -#define DST_8(OPER, I, J) \
> -	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
> -	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
> -
> -/**********************************************************************
> -  round macros
> - **********************************************************************/
> -
> -/**********************************************************************
> -  8-way chacha20 ("vertical")
> - **********************************************************************/
> -
> -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> -			      x8,x9,x10,x11,x12,x13,x14,x15,\
> -			      y0,y1,y2,y3,y4,y5,y6,y7,\
> -			      y8,y9,y10,y11,y12,y13,y14,y15,\
> -			      op1,op2,op3,op4,op5,op6,op7,op8,\
> -			      op9,op10,op11,op12) \
> -	op1;							\
> -	PLUS(x0, x1); PLUS(x4, x5);				\
> -	PLUS(x8, x9); PLUS(x12, x13);				\
> -	PLUS(y0, y1); PLUS(y4, y5);				\
> -	PLUS(y8, y9); PLUS(y12, y13);				\
> -	    op2;						\
> -	    XOR(x3, x0);  XOR(x7, x4);				\
> -	    XOR(x11, x8); XOR(x15, x12);			\
> -	    XOR(y3, y0);  XOR(y7, y4);				\
> -	    XOR(y11, y8); XOR(y15, y12);			\
> -		op3;						\
> -		ROTATE(x3, 16); ROTATE(x7, 16);			\
> -		ROTATE(x11, 16); ROTATE(x15, 16);		\
> -		ROTATE(y3, 16); ROTATE(y7, 16);			\
> -		ROTATE(y11, 16); ROTATE(y15, 16);		\
> -	op4;							\
> -	PLUS(x2, x3); PLUS(x6, x7);				\
> -	PLUS(x10, x11); PLUS(x14, x15);				\
> -	PLUS(y2, y3); PLUS(y6, y7);				\
> -	PLUS(y10, y11); PLUS(y14, y15);				\
> -	    op5;						\
> -	    XOR(x1, x2); XOR(x5, x6);				\
> -	    XOR(x9, x10); XOR(x13, x14);			\
> -	    XOR(y1, y2); XOR(y5, y6);				\
> -	    XOR(y9, y10); XOR(y13, y14);			\
> -		op6;						\
> -		ROTATE(x1,12); ROTATE(x5,12);			\
> -		ROTATE(x9,12); ROTATE(x13,12);			\
> -		ROTATE(y1,12); ROTATE(y5,12);			\
> -		ROTATE(y9,12); ROTATE(y13,12);			\
> -	op7;							\
> -	PLUS(x0, x1); PLUS(x4, x5);				\
> -	PLUS(x8, x9); PLUS(x12, x13);				\
> -	PLUS(y0, y1); PLUS(y4, y5);				\
> -	PLUS(y8, y9); PLUS(y12, y13);				\
> -	    op8;						\
> -	    XOR(x3, x0); XOR(x7, x4);				\
> -	    XOR(x11, x8); XOR(x15, x12);			\
> -	    XOR(y3, y0); XOR(y7, y4);				\
> -	    XOR(y11, y8); XOR(y15, y12);			\
> -		op9;						\
> -		ROTATE(x3,8); ROTATE(x7,8);			\
> -		ROTATE(x11,8); ROTATE(x15,8);			\
> -		ROTATE(y3,8); ROTATE(y7,8);			\
> -		ROTATE(y11,8); ROTATE(y15,8);			\
> -	op10;							\
> -	PLUS(x2, x3); PLUS(x6, x7);				\
> -	PLUS(x10, x11); PLUS(x14, x15);				\
> -	PLUS(y2, y3); PLUS(y6, y7);				\
> -	PLUS(y10, y11); PLUS(y14, y15);				\
> -	    op11;						\
> -	    XOR(x1, x2); XOR(x5, x6);				\
> -	    XOR(x9, x10); XOR(x13, x14);			\
> -	    XOR(y1, y2); XOR(y5, y6);				\
> -	    XOR(y9, y10); XOR(y13, y14);			\
> -		op12;						\
> -		ROTATE(x1,7); ROTATE(x5,7);			\
> -		ROTATE(x9,7); ROTATE(x13,7);			\
> -		ROTATE(y1,7); ROTATE(y5,7);			\
> -		ROTATE(y9,7); ROTATE(y13,7);
> -
> -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
> -			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
> -	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> -			      x8,x9,x10,x11,x12,x13,x14,x15,\
> -			      y0,y1,y2,y3,y4,y5,y6,y7,\
> -			      y8,y9,y10,y11,y12,y13,y14,y15,\
> -			      ,,,,,,,,,,,)
> -
> -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
> -	  vmrhf tmp0, v0, v1;					\
> -	  vmrhf tmp1, v2, v3;					\
> -	  vmrlf tmp2, v0, v1;					\
> -	  vmrlf   v3, v2, v3;					\
> -	  vmrhf tmpa, va, vb;					\
> -	  vmrhf tmpb, vc, vd;					\
> -	  vmrlf tmpc, va, vb;					\
> -	  vmrlf   vd, vc, vd;					\
> -	  vpdi v0, tmp0, tmp1, 0;				\
> -	  vpdi v1, tmp0, tmp1, 5;				\
> -	  vpdi v2, tmp2,   v3, 0;				\
> -	  vpdi v3, tmp2,   v3, 5;				\
> -	  vpdi va, tmpa, tmpb, 0;				\
> -	  vpdi vb, tmpa, tmpb, 5;				\
> -	  vpdi vc, tmpc,   vd, 0;				\
> -	  vpdi vd, tmpc,   vd, 5;
> -
> -.balign 8
> -.globl __chacha20_s390x_vx_blocks8
> -ENTRY (__chacha20_s390x_vx_blocks8)
> -	/* input:
> -	 *	%r2: input
> -	 *	%r3: dst
> -	 *	%r4: src
> -	 *	%r5: nblks (multiple of 8)
> -	 */
> -
> -	START_STACK(%r8);
> -	lgr NBLKS, %r5;
> -
> -	larl %r7, .Lconsts;
> -
> -	/* Load counter. */
> -	lg %r8, (12 * 4)(INPUT);
> -	rllg %r8, %r8, 32;
> -
> -.balign 4
> -	/* Process eight chacha20 blocks per loop. */
> -.Lloop8:
> -	vlm Y0, Y3, 0(INPUT);
> -
> -	slgfi NBLKS, 8;
> -	lghi ROUND, (20 / 2);
> -
> -	/* Construct counter vectors X12/X13 & Y12/Y13. */
> -	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
> -	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
> -	vrepf Y12, Y3, 0;
> -	vrepf Y13, Y3, 1;
> -	vaccf X5, Y12, X4;
> -	vaccf Y5, Y12, Y4;
> -	vaf X12, Y12, X4;
> -	vaf Y12, Y12, Y4;
> -	vaf X13, Y13, X5;
> -	vaf Y13, Y13, Y5;
> -
> -	vrepf X0, Y0, 0;
> -	vrepf X1, Y0, 1;
> -	vrepf X2, Y0, 2;
> -	vrepf X3, Y0, 3;
> -	vrepf X4, Y1, 0;
> -	vrepf X5, Y1, 1;
> -	vrepf X6, Y1, 2;
> -	vrepf X7, Y1, 3;
> -	vrepf X8, Y2, 0;
> -	vrepf X9, Y2, 1;
> -	vrepf X10, Y2, 2;
> -	vrepf X11, Y2, 3;
> -	vrepf X14, Y3, 2;
> -	vrepf X15, Y3, 3;
> -
> -	/* Store counters for blocks 0-7. */
> -	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
> -	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
> -
> -	vlr Y0, X0;
> -	vlr Y1, X1;
> -	vlr Y2, X2;
> -	vlr Y3, X3;
> -	vlr Y4, X4;
> -	vlr Y5, X5;
> -	vlr Y6, X6;
> -	vlr Y7, X7;
> -	vlr Y8, X8;
> -	vlr Y9, X9;
> -	vlr Y10, X10;
> -	vlr Y11, X11;
> -	vlr Y14, X14;
> -	vlr Y15, X15;
> -
> -	/* Update and store counter. */
> -	agfi %r8, 8;
> -	rllg %r5, %r8, 32;
> -	stg %r5, (12 * 4)(INPUT);
> -
> -.balign 4
> -.Lround2_8:
> -	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
> -			 X2, X6, X10, X14,   X3, X7, X11, X15,
> -			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
> -			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
> -	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
> -			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
> -			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
> -			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
> -	brctg ROUND, .Lround2_8;
> -
> -	/* Store blocks 4-7. */
> -	vstm Y0, Y15, STACK_Y0_Y15(%r15);
> -
> -	/* Load counters for blocks 0-3. */
> -	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
> -
> -	lghi ROUND, 1;
> -	j .Lfirst_output_4blks_8;
> -
> -.balign 4
> -.Lsecond_output_4blks_8:
> -	/* Load blocks 4-7. */
> -	vlm X0, X15, STACK_Y0_Y15(%r15);
> -
> -	/* Load counters for blocks 4-7. */
> -	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
> -
> -	lghi ROUND, 0;
> -
> -.balign 4
> -	/* Output four chacha20 blocks per loop. */
> -.Lfirst_output_4blks_8:
> -	vlm Y12, Y15, 0(INPUT);
> -	PLUS(X12, Y0);
> -	PLUS(X13, Y1);
> -	vrepf Y0, Y12, 0;
> -	vrepf Y1, Y12, 1;
> -	vrepf Y2, Y12, 2;
> -	vrepf Y3, Y12, 3;
> -	vrepf Y4, Y13, 0;
> -	vrepf Y5, Y13, 1;
> -	vrepf Y6, Y13, 2;
> -	vrepf Y7, Y13, 3;
> -	vrepf Y8, Y14, 0;
> -	vrepf Y9, Y14, 1;
> -	vrepf Y10, Y14, 2;
> -	vrepf Y11, Y14, 3;
> -	vrepf Y14, Y15, 2;
> -	vrepf Y15, Y15, 3;
> -	PLUS(X0, Y0);
> -	PLUS(X1, Y1);
> -	PLUS(X2, Y2);
> -	PLUS(X3, Y3);
> -	PLUS(X4, Y4);
> -	PLUS(X5, Y5);
> -	PLUS(X6, Y6);
> -	PLUS(X7, Y7);
> -	PLUS(X8, Y8);
> -	PLUS(X9, Y9);
> -	PLUS(X10, Y10);
> -	PLUS(X11, Y11);
> -	PLUS(X14, Y14);
> -	PLUS(X15, Y15);
> -
> -	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
> -	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
> -			Y9, Y10, Y11, Y12, Y13, Y14);
> -	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
> -			Y9, Y10, Y11, Y12, Y13, Y14);
> -
> -	vlm Y0, Y14, 0(SRC);
> -	vperm X0, X0, X0, Y15;
> -	vperm X1, X1, X1, Y15;
> -	vperm X2, X2, X2, Y15;
> -	vperm X3, X3, X3, Y15;
> -	vperm X4, X4, X4, Y15;
> -	vperm X5, X5, X5, Y15;
> -	vperm X6, X6, X6, Y15;
> -	vperm X7, X7, X7, Y15;
> -	vperm X8, X8, X8, Y15;
> -	vperm X9, X9, X9, Y15;
> -	vperm X10, X10, X10, Y15;
> -	vperm X11, X11, X11, Y15;
> -	vperm X12, X12, X12, Y15;
> -	vperm X13, X13, X13, Y15;
> -	vperm X14, X14, X14, Y15;
> -	vperm X15, X15, X15, Y15;
> -	vl Y15, (15 * 16)(SRC);
> -
> -	XOR(Y0, X0);
> -	XOR(Y1, X4);
> -	XOR(Y2, X8);
> -	XOR(Y3, X12);
> -	XOR(Y4, X1);
> -	XOR(Y5, X5);
> -	XOR(Y6, X9);
> -	XOR(Y7, X13);
> -	XOR(Y8, X2);
> -	XOR(Y9, X6);
> -	XOR(Y10, X10);
> -	XOR(Y11, X14);
> -	XOR(Y12, X3);
> -	XOR(Y13, X7);
> -	XOR(Y14, X11);
> -	XOR(Y15, X15);
> -	vstm Y0, Y15, 0(DST);
> -
> -	aghi SRC, 256;
> -	aghi DST, 256;
> -
> -	clgije ROUND, 1, .Lsecond_output_4blks_8;
> -
> -	clgijhe NBLKS, 8, .Lloop8;
> -
> -
> -	END_STACK(%r8);
> -	xgr %r2, %r2;
> -	br %r14;
> -END (__chacha20_s390x_vx_blocks8)
> -
> -#endif /* HAVE_S390_VX_ASM_SUPPORT */
> diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
> deleted file mode 100644
> index 0c6abf77e8..0000000000
> --- a/sysdeps/s390/s390-64/chacha20_arch.h
> +++ /dev/null
> @@ -1,45 +0,0 @@
> -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -#include <sys/auxv.h>
> -
> -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
> -					  const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> -  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 8");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> -  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
> -    {
> -      __chacha20_s390x_vx_blocks8 (state, dst, src,
> -				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -      return;
> -    }
> -#endif
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
> index 0326ebb767..c8a9ed2d40 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.c
> +++ b/sysdeps/unix/sysv/linux/tls-internal.c
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <stdlib/arc4random.h>
>  #include <string.h>
>  #include <tls-internal.h>
>  
> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
>    struct pthread *self = THREAD_SELF;
>    free (self->tls_state.strsignal_buf);
>    free (self->tls_state.strerror_l_buf);
> -
> -  if (self->tls_state.rand_state != NULL)
> -    {
> -      /* Clear any lingering random state prior so if the thread stack is
> -         cached it won't leak any data.  */
> -      explicit_bzero (self->tls_state.rand_state,
> -		      sizeof (*self->tls_state.rand_state));
> -      free (self->tls_state.rand_state);
> -    }
>  }
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 1178475d75..c19bef2dec 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
>  gen-as-const-headers += link-defines.sym
>  endif
>  
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> -  chacha20-amd64-sse2 \
> -  chacha20-amd64-avx2 \
> -  # sysdep_routines
> -endif
> -
>  ifeq ($(subdir),gmon)
>  sysdep_routines += _mcount
>  # We cannot compile _mcount.S with -pg because that would create
> diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
> deleted file mode 100644
> index aefd1cdbd0..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-avx2.S
> +++ /dev/null
> @@ -1,328 +0,0 @@
> -/* Optimized AVX2 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -
> -#ifdef PIC
> -#  define rRIP (%rip)
> -#else
> -#  define rRIP
> -#endif
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST   %rsi
> -#define SRC   %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (32)
> -#define STACK_VEC_X13 (32 + STACK_VEC_X12)
> -#define STACK_TMP     (32 + STACK_VEC_X13)
> -#define STACK_TMP1    (32 + STACK_TMP)
> -
> -#define STACK_MAX     (32 + STACK_TMP1)
> -
> -/* vector registers */
> -#define X0 %ymm0
> -#define X1 %ymm1
> -#define X2 %ymm2
> -#define X3 %ymm3
> -#define X4 %ymm4
> -#define X5 %ymm5
> -#define X6 %ymm6
> -#define X7 %ymm7
> -#define X8 %ymm8
> -#define X9 %ymm9
> -#define X10 %ymm10
> -#define X11 %ymm11
> -#define X12 %ymm12
> -#define X13 %ymm13
> -#define X14 %ymm14
> -#define X15 %ymm15
> -
> -#define X0h %xmm0
> -#define X1h %xmm1
> -#define X2h %xmm2
> -#define X3h %xmm3
> -#define X4h %xmm4
> -#define X5h %xmm5
> -#define X6h %xmm6
> -#define X7h %xmm7
> -#define X8h %xmm8
> -#define X9h %xmm9
> -#define X10h %xmm10
> -#define X11h %xmm11
> -#define X12h %xmm12
> -#define X13h %xmm13
> -#define X14h %xmm14
> -#define X15h %xmm15
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
> -	vpunpckhdq x1, x0, t2; \
> -	vpunpckldq x1, x0, x0; \
> -	\
> -	vpunpckldq x3, x2, t1; \
> -	vpunpckhdq x3, x2, x2; \
> -	\
> -	vpunpckhqdq t1, x0, x1; \
> -	vpunpcklqdq t1, x0, x0; \
> -	\
> -	vpunpckhqdq x2, t2, x3; \
> -	vpunpcklqdq x2, t2, x2;
> -
> -/* 2x2 128-bit matrix transpose */
> -#define transpose_16byte_2x2(x0,x1,t1) \
> -	vmovdqa    x0, t1; \
> -	vperm2i128 $0x20, x1, x0, x0; \
> -	vperm2i128 $0x31, x1, t1, x1;
> -
> -/**********************************************************************
> -  8-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp)	\
> -	vpsrld $(32 - (c)), v1, tmp;	\
> -	vpslld $(c), v1, v1;		\
> -	vpaddb tmp, v1, v1;		\
> -	vpsrld $(32 - (c)), v2, tmp;	\
> -	vpslld $(c), v2, v2;		\
> -	vpaddb tmp, v2, v2;
> -
> -#define ROTATE_SHUF_2(v1,v2,shuf)	\
> -	vpshufb shuf, v1, v1;		\
> -	vpshufb shuf, v2, v2;
> -
> -#define XOR(ds,s) \
> -	vpxor s, ds, ds;
> -
> -#define PLUS(ds,s) \
> -	vpaddd s, ds, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
> -		      interleave_op1,interleave_op2,\
> -		      interleave_op3,interleave_op4)		\
> -	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
> -		interleave_op1;					\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE_SHUF_2(d1, d2, tmp1);			\
> -		interleave_op2;					\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2, 12, tmp1);				\
> -	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
> -		interleave_op3;					\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE_SHUF_2(d1, d2, tmp1);			\
> -		interleave_op4;					\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2,  7, tmp1);
> -
> -	.section .text.avx2, "ax", @progbits
> -	.align 32
> -chacha20_data:
> -L(shuf_rol16):
> -	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
> -L(shuf_rol8):
> -	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
> -L(inc_counter):
> -	.byte 0,1,2,3,4,5,6,7
> -L(unsigned_cmp):
> -	.long 0x80000000
> -
> -	.hidden __chacha20_avx2_blocks8
> -ENTRY (__chacha20_avx2_blocks8)
> -	/* input:
> -	 *	%rdi: input
> -	 *	%rsi: dst
> -	 *	%rdx: src
> -	 *	%rcx: nblks (multiple of 8)
> -	 */
> -	vzeroupper;
> -
> -	pushq %rbp;
> -	cfi_adjust_cfa_offset(8);
> -	cfi_rel_offset(rbp, 0)
> -	movq %rsp, %rbp;
> -	cfi_def_cfa_register(rbp);
> -
> -	subq $STACK_MAX, %rsp;
> -	andq $~31, %rsp;
> -
> -L(loop8):
> -	mov $20, ROUND;
> -
> -	/* Construct counter vectors X12 and X13 */
> -	vpmovzxbd L(inc_counter) rRIP, X0;
> -	vpbroadcastd L(unsigned_cmp) rRIP, X2;
> -	vpbroadcastd (12 * 4)(INPUT), X12;
> -	vpbroadcastd (13 * 4)(INPUT), X13;
> -	vpaddd X0, X12, X12;
> -	vpxor X2, X0, X0;
> -	vpxor X2, X12, X1;
> -	vpcmpgtd X1, X0, X0;
> -	vpsubd X0, X13, X13;
> -	vmovdqa X12, (STACK_VEC_X12)(%rsp);
> -	vmovdqa X13, (STACK_VEC_X13)(%rsp);
> -
> -	/* Load vectors */
> -	vpbroadcastd (0 * 4)(INPUT), X0;
> -	vpbroadcastd (1 * 4)(INPUT), X1;
> -	vpbroadcastd (2 * 4)(INPUT), X2;
> -	vpbroadcastd (3 * 4)(INPUT), X3;
> -	vpbroadcastd (4 * 4)(INPUT), X4;
> -	vpbroadcastd (5 * 4)(INPUT), X5;
> -	vpbroadcastd (6 * 4)(INPUT), X6;
> -	vpbroadcastd (7 * 4)(INPUT), X7;
> -	vpbroadcastd (8 * 4)(INPUT), X8;
> -	vpbroadcastd (9 * 4)(INPUT), X9;
> -	vpbroadcastd (10 * 4)(INPUT), X10;
> -	vpbroadcastd (11 * 4)(INPUT), X11;
> -	vpbroadcastd (14 * 4)(INPUT), X14;
> -	vpbroadcastd (15 * 4)(INPUT), X15;
> -	vmovdqa X15, (STACK_TMP)(%rsp);
> -
> -L(round2):
> -	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
> -	vmovdqa (STACK_TMP)(%rsp), X15;
> -	vmovdqa X8, (STACK_TMP)(%rsp);
> -	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
> -	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
> -	vmovdqa (STACK_TMP)(%rsp), X8;
> -	vmovdqa X15, (STACK_TMP)(%rsp);
> -	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
> -	sub $2, ROUND;
> -	jnz L(round2);
> -
> -	vmovdqa X8, (STACK_TMP1)(%rsp);
> -
> -	/* tmp := X15 */
> -	vpbroadcastd (0 * 4)(INPUT), X15;
> -	PLUS(X0, X15);
> -	vpbroadcastd (1 * 4)(INPUT), X15;
> -	PLUS(X1, X15);
> -	vpbroadcastd (2 * 4)(INPUT), X15;
> -	PLUS(X2, X15);
> -	vpbroadcastd (3 * 4)(INPUT), X15;
> -	PLUS(X3, X15);
> -	vpbroadcastd (4 * 4)(INPUT), X15;
> -	PLUS(X4, X15);
> -	vpbroadcastd (5 * 4)(INPUT), X15;
> -	PLUS(X5, X15);
> -	vpbroadcastd (6 * 4)(INPUT), X15;
> -	PLUS(X6, X15);
> -	vpbroadcastd (7 * 4)(INPUT), X15;
> -	PLUS(X7, X15);
> -	transpose_4x4(X0, X1, X2, X3, X8, X15);
> -	transpose_4x4(X4, X5, X6, X7, X8, X15);
> -	vmovdqa (STACK_TMP1)(%rsp), X8;
> -	transpose_16byte_2x2(X0, X4, X15);
> -	transpose_16byte_2x2(X1, X5, X15);
> -	transpose_16byte_2x2(X2, X6, X15);
> -	transpose_16byte_2x2(X3, X7, X15);
> -	vmovdqa (STACK_TMP)(%rsp), X15;
> -	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
> -	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
> -	vpbroadcastd (8 * 4)(INPUT), X0;
> -	PLUS(X8, X0);
> -	vpbroadcastd (9 * 4)(INPUT), X0;
> -	PLUS(X9, X0);
> -	vpbroadcastd (10 * 4)(INPUT), X0;
> -	PLUS(X10, X0);
> -	vpbroadcastd (11 * 4)(INPUT), X0;
> -	PLUS(X11, X0);
> -	vmovdqa (STACK_VEC_X12)(%rsp), X0;
> -	PLUS(X12, X0);
> -	vmovdqa (STACK_VEC_X13)(%rsp), X0;
> -	PLUS(X13, X0);
> -	vpbroadcastd (14 * 4)(INPUT), X0;
> -	PLUS(X14, X0);
> -	vpbroadcastd (15 * 4)(INPUT), X0;
> -	PLUS(X15, X0);
> -	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
> -	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
> -
> -	/* Update counter */
> -	addq $8, (12 * 4)(INPUT);
> -
> -	transpose_4x4(X8, X9, X10, X11, X0, X1);
> -	transpose_4x4(X12, X13, X14, X15, X0, X1);
> -	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
> -	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
> -	transpose_16byte_2x2(X8, X12, X0);
> -	transpose_16byte_2x2(X9, X13, X0);
> -	transpose_16byte_2x2(X10, X14, X0);
> -	transpose_16byte_2x2(X11, X15, X0);
> -	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
> -	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
> -	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
> -	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
> -	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
> -	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
> -	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
> -	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
> -	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
> -	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
> -
> -	sub $8, NBLKS;
> -	lea (8 * 64)(DST), DST;
> -	lea (8 * 64)(SRC), SRC;
> -	jnz L(loop8);
> -
> -	vzeroupper;
> -
> -	/* eax zeroed by round loop. */
> -	leave;
> -	cfi_adjust_cfa_offset(-8)
> -	cfi_def_cfa_register(%rsp);
> -	ret;
> -	int3;
> -END(__chacha20_avx2_blocks8)
> diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
> deleted file mode 100644
> index 351a1109c6..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-sse2.S
> +++ /dev/null
> @@ -1,311 +0,0 @@
> -/* Optimized SSE2 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -#include <isa-level.h>
> -
> -#if MINIMUM_X86_ISA_LEVEL <= 2
> -
> -#ifdef PIC
> -#  define rRIP (%rip)
> -#else
> -#  define rRIP
> -#endif
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> -        ret; int3;
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST   %rsi
> -#define SRC   %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (16)
> -#define STACK_VEC_X13 (16 + STACK_VEC_X12)
> -#define STACK_TMP     (16 + STACK_VEC_X13)
> -#define STACK_TMP1    (16 + STACK_TMP)
> -#define STACK_TMP2    (16 + STACK_TMP1)
> -
> -#define STACK_MAX     (16 + STACK_TMP2)
> -
> -/* vector registers */
> -#define X0 %xmm0
> -#define X1 %xmm1
> -#define X2 %xmm2
> -#define X3 %xmm3
> -#define X4 %xmm4
> -#define X5 %xmm5
> -#define X6 %xmm6
> -#define X7 %xmm7
> -#define X8 %xmm8
> -#define X9 %xmm9
> -#define X10 %xmm10
> -#define X11 %xmm11
> -#define X12 %xmm12
> -#define X13 %xmm13
> -#define X14 %xmm14
> -#define X15 %xmm15
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
> -	movdqa    x0, t2; \
> -	punpckhdq x1, t2; \
> -	punpckldq x1, x0; \
> -	\
> -	movdqa    x2, t1; \
> -	punpckldq x3, t1; \
> -	punpckhdq x3, x2; \
> -	\
> -	movdqa     x0, x1; \
> -	punpckhqdq t1, x1; \
> -	punpcklqdq t1, x0; \
> -	\
> -	movdqa     t2, x3; \
> -	punpckhqdq x2, x3; \
> -	punpcklqdq x2, t2; \
> -	movdqa     t2, x2;
> -
> -/* fill xmm register with 32-bit value from memory */
> -#define PBROADCASTD(mem32, xreg) \
> -	movd mem32, xreg; \
> -	pshufd $0, xreg, xreg;
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
> -	movdqa v1, tmp1; 		\
> -	movdqa v2, tmp2; 		\
> -	psrld $(32 - (c)), v1;		\
> -	pslld $(c), tmp1;		\
> -	paddb tmp1, v1;			\
> -	psrld $(32 - (c)), v2;		\
> -	pslld $(c), tmp2;		\
> -	paddb tmp2, v2;
> -
> -#define XOR(ds,s) \
> -	pxor s, ds;
> -
> -#define PLUS(ds,s) \
> -	paddd s, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2,  7, tmp1, tmp2);
> -
> -	.section .text.sse2,"ax",@progbits
> -
> -chacha20_data:
> -	.align 16
> -L(counter1):
> -	.long 1,0,0,0
> -L(inc_counter):
> -	.long 0,1,2,3
> -L(unsigned_cmp):
> -	.long 0x80000000,0x80000000,0x80000000,0x80000000
> -
> -	.hidden __chacha20_sse2_blocks4
> -ENTRY (__chacha20_sse2_blocks4)
> -	/* input:
> -	 *	%rdi: input
> -	 *	%rsi: dst
> -	 *	%rdx: src
> -	 *	%rcx: nblks (multiple of 4)
> -	 */
> -
> -	pushq %rbp;
> -	cfi_adjust_cfa_offset(8);
> -	cfi_rel_offset(rbp, 0)
> -	movq %rsp, %rbp;
> -	cfi_def_cfa_register(%rbp);
> -
> -	subq $STACK_MAX, %rsp;
> -	andq $~15, %rsp;
> -
> -L(loop4):
> -	mov $20, ROUND;
> -
> -	/* Construct counter vectors X12 and X13 */
> -	movdqa L(inc_counter) rRIP, X0;
> -	movdqa L(unsigned_cmp) rRIP, X2;
> -	PBROADCASTD((12 * 4)(INPUT), X12);
> -	PBROADCASTD((13 * 4)(INPUT), X13);
> -	paddd X0, X12;
> -	movdqa X12, X1;
> -	pxor X2, X0;
> -	pxor X2, X1;
> -	pcmpgtd X1, X0;
> -	psubd X0, X13;
> -	movdqa X12, (STACK_VEC_X12)(%rsp);
> -	movdqa X13, (STACK_VEC_X13)(%rsp);
> -
> -	/* Load vectors */
> -	PBROADCASTD((0 * 4)(INPUT), X0);
> -	PBROADCASTD((1 * 4)(INPUT), X1);
> -	PBROADCASTD((2 * 4)(INPUT), X2);
> -	PBROADCASTD((3 * 4)(INPUT), X3);
> -	PBROADCASTD((4 * 4)(INPUT), X4);
> -	PBROADCASTD((5 * 4)(INPUT), X5);
> -	PBROADCASTD((6 * 4)(INPUT), X6);
> -	PBROADCASTD((7 * 4)(INPUT), X7);
> -	PBROADCASTD((8 * 4)(INPUT), X8);
> -	PBROADCASTD((9 * 4)(INPUT), X9);
> -	PBROADCASTD((10 * 4)(INPUT), X10);
> -	PBROADCASTD((11 * 4)(INPUT), X11);
> -	PBROADCASTD((14 * 4)(INPUT), X14);
> -	PBROADCASTD((15 * 4)(INPUT), X15);
> -	movdqa X11, (STACK_TMP)(%rsp);
> -	movdqa X15, (STACK_TMP1)(%rsp);
> -
> -L(round2_4):
> -	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
> -	movdqa (STACK_TMP)(%rsp), X11;
> -	movdqa (STACK_TMP1)(%rsp), X15;
> -	movdqa X8, (STACK_TMP)(%rsp);
> -	movdqa X9, (STACK_TMP1)(%rsp);
> -	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
> -	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
> -	movdqa (STACK_TMP)(%rsp), X8;
> -	movdqa (STACK_TMP1)(%rsp), X9;
> -	movdqa X11, (STACK_TMP)(%rsp);
> -	movdqa X15, (STACK_TMP1)(%rsp);
> -	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
> -	sub $2, ROUND;
> -	jnz L(round2_4);
> -
> -	/* tmp := X15 */
> -	movdqa (STACK_TMP)(%rsp), X11;
> -	PBROADCASTD((0 * 4)(INPUT), X15);
> -	PLUS(X0, X15);
> -	PBROADCASTD((1 * 4)(INPUT), X15);
> -	PLUS(X1, X15);
> -	PBROADCASTD((2 * 4)(INPUT), X15);
> -	PLUS(X2, X15);
> -	PBROADCASTD((3 * 4)(INPUT), X15);
> -	PLUS(X3, X15);
> -	PBROADCASTD((4 * 4)(INPUT), X15);
> -	PLUS(X4, X15);
> -	PBROADCASTD((5 * 4)(INPUT), X15);
> -	PLUS(X5, X15);
> -	PBROADCASTD((6 * 4)(INPUT), X15);
> -	PLUS(X6, X15);
> -	PBROADCASTD((7 * 4)(INPUT), X15);
> -	PLUS(X7, X15);
> -	PBROADCASTD((8 * 4)(INPUT), X15);
> -	PLUS(X8, X15);
> -	PBROADCASTD((9 * 4)(INPUT), X15);
> -	PLUS(X9, X15);
> -	PBROADCASTD((10 * 4)(INPUT), X15);
> -	PLUS(X10, X15);
> -	PBROADCASTD((11 * 4)(INPUT), X15);
> -	PLUS(X11, X15);
> -	movdqa (STACK_VEC_X12)(%rsp), X15;
> -	PLUS(X12, X15);
> -	movdqa (STACK_VEC_X13)(%rsp), X15;
> -	PLUS(X13, X15);
> -	movdqa X13, (STACK_TMP)(%rsp);
> -	PBROADCASTD((14 * 4)(INPUT), X15);
> -	PLUS(X14, X15);
> -	movdqa (STACK_TMP1)(%rsp), X15;
> -	movdqa X14, (STACK_TMP1)(%rsp);
> -	PBROADCASTD((15 * 4)(INPUT), X13);
> -	PLUS(X15, X13);
> -	movdqa X15, (STACK_TMP2)(%rsp);
> -
> -	/* Update counter */
> -	addq $4, (12 * 4)(INPUT);
> -
> -	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
> -	movdqu X0, (64 * 0 + 16 * 0)(DST)
> -	movdqu X1, (64 * 1 + 16 * 0)(DST)
> -	movdqu X2, (64 * 2 + 16 * 0)(DST)
> -	movdqu X3, (64 * 3 + 16 * 0)(DST)
> -	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
> -	movdqa (STACK_TMP)(%rsp), X13;
> -	movdqa (STACK_TMP1)(%rsp), X14;
> -	movdqa (STACK_TMP2)(%rsp), X15;
> -	movdqu X4, (64 * 0 + 16 * 1)(DST)
> -	movdqu X5, (64 * 1 + 16 * 1)(DST)
> -	movdqu X6, (64 * 2 + 16 * 1)(DST)
> -	movdqu X7, (64 * 3 + 16 * 1)(DST)
> -	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
> -	movdqu X8,  (64 * 0 + 16 * 2)(DST)
> -	movdqu X9,  (64 * 1 + 16 * 2)(DST)
> -	movdqu X10, (64 * 2 + 16 * 2)(DST)
> -	movdqu X11, (64 * 3 + 16 * 2)(DST)
> -	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
> -	movdqu X12, (64 * 0 + 16 * 3)(DST)
> -	movdqu X13, (64 * 1 + 16 * 3)(DST)
> -	movdqu X14, (64 * 2 + 16 * 3)(DST)
> -	movdqu X15, (64 * 3 + 16 * 3)(DST)
> -
> -	sub $4, NBLKS;
> -	lea (4 * 64)(DST), DST;
> -	lea (4 * 64)(SRC), SRC;
> -	jnz L(loop4);
> -
> -	/* eax zeroed by round loop. */
> -	leave;
> -	cfi_adjust_cfa_offset(-8)
> -	cfi_def_cfa_register(%rsp);
> -	ret_spec_stop;
> -END (__chacha20_sse2_blocks4)
> -
> -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> deleted file mode 100644
> index 6f3784e392..0000000000
> --- a/sysdeps/x86_64/chacha20_arch.h
> +++ /dev/null
> @@ -1,55 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <isa-level.h>
> -#include <ldsodefs.h>
> -#include <cpu-features.h>
> -#include <sys/param.h>
> -
> -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> -#if MINIMUM_X86_ISA_LEVEL > 2
> -  __chacha20_avx2_blocks8 (state, dst, src,
> -			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> -  const struct cpu_features* cpu_features = __get_cpu_features ();
> -
> -  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
> -  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> -      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
> -    __chacha20_avx2_blocks8 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -  else
> -    __chacha20_sse2_blocks4 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#endif
> -}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:33       ` Adhemerval Zanella Netto
@ 2022-07-26 11:54         ` Jason A. Donenfeld
  2022-07-26 12:08           ` Jason A. Donenfeld
                             ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 11:54 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto

Hi Adhemerval,

Thanks for your review.

On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote:
> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
> to update the manual. Some comments below.

I think Eric already pointed those out, and they're fixed in v3 now.
PTAL.

> > +  static bool have_getrandom = true, seen_initialized = false;
> > +  int fd;
> 
> I think it should reasonable to assume that getrandom syscall will be always
> supported and using arc4random in an enviroment with filtered getrandom does
> not make much sense.  We are trying to avoid add this static syscall checks
> where possible,

I don't know glibc's requirements for kernels, though I do know that
it'd be nice to not have to write this fallback code in every program I
write and just use libc's thing. So in that sense, having the fallback
to /dev/urandom makes arc4random_buf a lot more useful. But with that
said, yea, maybe we shouldn't care about old kernels? getrandom is now
quite old and the stable kernels on kernel.org all have it.

From my perspective, I don't have a strongly developed opinion on what
makes sense for glibc. If Florian agrees with you, I'll send a v+1 with
the fallback code removed. If it's contentious, maybe the fallback code
should stay in and we can slate it for removal on another day, when the
minimum glibc kernel version gets raised or something like that.

> also plain load/store to se the static have_getrandom
> is strickly a race-condition, although it should not really matter (we use
> relaxed load/store in such optimization (check
> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).

I was aware of the race but figured it didn't matter, since two racing
threads will both set it to the same result eventually. But I didn't
know about the convention of using those relaxed wrapper functions.
Thanks for the tip. I'll do that for v4.

> Also, does it make sense to fallback if we build for a kernel that should
> always support getrandom?

I guess only if syscall filtering is a concern. But if not, then maybe
yea? We could do this in a follow-up commit, or I could do this in v4.
Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
If so, I think the way I'd implement that would be:

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 978bf9287f..a33d9ff2c5 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n)
     {
       ssize_t l;

+#if __LINUX_KERNEL_VERSION < something
       if (!atomic_load_relaxed (&have_getrandom))
 	break;
+#endif

       l = __getrandom_nocancel (p, n, 0);
       if (l > 0)
@@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n)
 	arc4random_getrandom_failure (); /* Weird, should never happen. */
       else if (l == -EINTR)
 	continue; /* Interrupted by a signal; keep going. */
+#if __LINUX_KERNEL_VERSION < something
       else if (l == -ENOSYS)
 	{
 	  atomic_store_relaxed (&have_getrandom, false);
 	  break; /* No syscall, so fallback to /dev/urandom. */
 	}
+#endif
       arc4random_getrandom_failure (); /* Unknown error, should never happen. */
     }

And then arc4random_getrandom_failure() being a noreturn function would
make gcc optimize out the rest.

Does that seem like a good approach?

> > +      l = __getrandom_nocancel (p, n, 0);
> 
> Do we need to worry about a potentially uncancellable blocking call here? I guess
> using GRND_NONBLOCK does not really help.

No, generally not. Also, keep in mind that getrandom(0) will trigger
jitter entropy if the kernel isn't already initialized.

> 
> > +      if (l > 0)
> > +	{
> > +	  if ((size_t) l == n)
> 
> Do we need the cast here?

Generally it's frowned upon to have implicit signed conversion, right? l
is signed while n is unsigned.

> 
> > +	    return; /* Done reading, success. */
> 
> Minor style issue: use double space before period.

I was really confused by this, and then opened up some other files and
saw you meant *after* period. :) Will do for v4.

> As Florian said we will need a non cancellable poll here.  Since you are setting
> the timeout as undefined, I think it would be simple to just add a non cancellable
> wrapper as:
> 
>   int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
>   {
>   #ifndef __NR_ppoll_time64
>   # define __NR_ppoll_time64 __NR_ppoll
>   #endif
>      return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
>   }
> 
> So we don't need to handle the timeout for 64-bit time_t wrappers.

Oh that sounds like a good solution to the time64 situation. I'll do
that for v4... BUT, I already implemented possibly the wrong solution
for v3. Could you take a look at what I did there and confirm that it's
wrong? If so, then I'll do exactly what you suggested here.

Thanks again for the review,
Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:54         ` Jason A. Donenfeld
@ 2022-07-26 12:08           ` Jason A. Donenfeld
  2022-07-26 12:20           ` Jason A. Donenfeld
  2022-07-26 12:34           ` Adhemerval Zanella Netto
  2 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 12:08 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto

Hey again,

On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote:
> > As Florian said we will need a non cancellable poll here.  Since you are setting
> > the timeout as undefined, I think it would be simple to just add a non cancellable
> > wrapper as:
> > 
> >   int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
> >   {
> >   #ifndef __NR_ppoll_time64
> >   # define __NR_ppoll_time64 __NR_ppoll
> >   #endif
> >      return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
> >   }
> > 
> > So we don't need to handle the timeout for 64-bit time_t wrappers.
> 
> Oh that sounds like a good solution to the time64 situation. I'll do
> that for v4... BUT, I already implemented possibly the wrong solution
> for v3. Could you take a look at what I did there and confirm that it's
> wrong? If so, then I'll do exactly what you suggested here.

Actually, forget my v3. What you're suggesting is also better because
it's ppoll, not poll, as poll isn't on all platforms. So I'll do things
exactly as you've described for v4.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:54         ` Jason A. Donenfeld
  2022-07-26 12:08           ` Jason A. Donenfeld
@ 2022-07-26 12:20           ` Jason A. Donenfeld
  2022-07-26 12:34           ` Adhemerval Zanella Netto
  2 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 12:20 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto

On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote:
> > Also, does it make sense to fallback if we build for a kernel that should
> > always support getrandom?
> 
> I guess only if syscall filtering is a concern. But if not, then maybe
> yea? We could do this in a follow-up commit, or I could do this in v4.
> Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
> If so, I think the way I'd implement that would be:
>
> [...]
>
> And then arc4random_getrandom_failure() being a noreturn function would
> make gcc optimize out the rest.
> 
> Does that seem like a good approach?

It actually winds up looking a bit more like the below. Let me know if
you want that in v4.

diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index c0f132ea9b..8fcf41e7de 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -43,7 +43,7 @@ __arc4random_buf (void *p, size_t n)
     {
       ssize_t l;

-      if (!atomic_load_relaxed (&have_getrandom))
+      if (!__ASSUME_GETRANDOM && !atomic_load_relaxed (&have_getrandom))
 	break;

       l = __getrandom_nocancel (p, n, 0);
@@ -59,7 +59,7 @@ __arc4random_buf (void *p, size_t n)
 	arc4random_getrandom_failure (); /* Weird, should never happen.  */
       else if (l == -EINTR)
 	continue; /* Interrupted by a signal; keep going.  */
-      else if (l == -ENOSYS)
+      else if (!__ASSUME_GETRANDOM && l == -ENOSYS)
 	{
 	  atomic_store_relaxed (&have_getrandom, false);
 	  break; /* No syscall, so fallback to /dev/urandom.  */
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 74adc3956b..75d5f953d4 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -236,4 +236,11 @@
 # define __ASSUME_FUTEX_LOCK_PI2 0
 #endif

+/* The getrandom() syscall was added in 3.17.  */
+#if __LINUX_KERNEL_VERSION >= 0x031100
+# define __ASSUME_GETRANDOM 1
+#else
+# define __ASSUME_GETRANDOM 0
+#endif
+
 #endif /* kernel-features.h */

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 11:54         ` Jason A. Donenfeld
  2022-07-26 12:08           ` Jason A. Donenfeld
  2022-07-26 12:20           ` Jason A. Donenfeld
@ 2022-07-26 12:34           ` Adhemerval Zanella Netto
  2022-07-26 12:47             ` Jason A. Donenfeld
  2 siblings, 1 reply; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 12:34 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto



On 26/07/22 08:54, Jason A. Donenfeld wrote:
> Hi Adhemerval,
> 
> Thanks for your review.
> 
> On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote:
>> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
>> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
>> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
>> to update the manual. Some comments below.
> 
> I think Eric already pointed those out, and they're fixed in v3 now.
> PTAL.
> 
>>> +  static bool have_getrandom = true, seen_initialized = false;
>>> +  int fd;
>>
>> I think it should reasonable to assume that getrandom syscall will be always
>> supported and using arc4random in an enviroment with filtered getrandom does
>> not make much sense.  We are trying to avoid add this static syscall checks
>> where possible,
> 
> I don't know glibc's requirements for kernels, though I do know that
> it'd be nice to not have to write this fallback code in every program I
> write and just use libc's thing. So in that sense, having the fallback
> to /dev/urandom makes arc4random_buf a lot more useful. But with that
> said, yea, maybe we shouldn't care about old kernels? getrandom is now
> quite old and the stable kernels on kernel.org all have it.

We do not enforce kernels version anymore, although we still support the
--enable-kernel=x.y that changes on how glibc internally assume some syscall
(so there is no need to fallback if it were the case).

So the question is where we need the fallback code for --enable-kernel=3.17.
If kernel is returning ENOSYS in this case (and assuming you are running on
kernel newer than 3.17) it means some syscall filtering, and I am not sure
we should need to actually handle it.  The main idea of adding this minor
optimization is to once we increase the minimum supported kernel we can
clean this code up.

> 
> From my perspective, I don't have a strongly developed opinion on what
> makes sense for glibc. If Florian agrees with you, I'll send a v+1 with
> the fallback code removed. If it's contentious, maybe the fallback code
> should stay in and we can slate it for removal on another day, when the
> minimum glibc kernel version gets raised or something like that.

I think the fallback code make sense since the minimum supported kernel we
still support is 3.2, although I am not sure how getrandom and/or /dev/urandom
will play in such older kernels.

> 
>> also plain load/store to se the static have_getrandom
>> is strickly a race-condition, although it should not really matter (we use
>> relaxed load/store in such optimization (check
>> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).
> 
> I was aware of the race but figured it didn't matter, since two racing
> threads will both set it to the same result eventually. But I didn't
> know about the convention of using those relaxed wrapper functions.
> Thanks for the tip. I'll do that for v4.
> 
>> Also, does it make sense to fallback if we build for a kernel that should
>> always support getrandom?
> 
> I guess only if syscall filtering is a concern. But if not, then maybe
> yea? We could do this in a follow-up commit, or I could do this in v4.
> Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
> If so, I think the way I'd implement that would be:
> 
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 978bf9287f..a33d9ff2c5 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n)
>      {
>        ssize_t l;
> 
> +#if __LINUX_KERNEL_VERSION < something
>        if (!atomic_load_relaxed (&have_getrandom))
>  	break;
> +#endif> 
>        l = __getrandom_nocancel (p, n, 0);
>        if (l > 0)
> @@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n)
>  	arc4random_getrandom_failure (); /* Weird, should never happen. */
>        else if (l == -EINTR)
>  	continue; /* Interrupted by a signal; keep going. */
> +#if __LINUX_KERNEL_VERSION < something
>        else if (l == -ENOSYS)
>  	{
>  	  atomic_store_relaxed (&have_getrandom, false);
>  	  break; /* No syscall, so fallback to /dev/urandom. */
>  	}
> +#endif
>        arc4random_getrandom_failure (); /* Unknown error, should never happen. */
>      }
> 
> And then arc4random_getrandom_failure() being a noreturn function would
> make gcc optimize out the rest.
> 
> Does that seem like a good approach?

I think so, although he __LINUX_KERNEL_VERSION is Linux-only that should 
be moved to sysdeps/unix/sysv/linux.

Usually we do as a wrapper (static inline or hidden symbol), with the generic
implementation on sysdep/generic or include with Linux redefining on its own
folder.

We also a use __ASSUME macros (check sysdeps/unix/sysv/linux/kernel-features.h),
it should be something like __ASSUME_GETRANDOM (we did not have a use for it 
because we do not want a fallback for getrandom implementation).

So I would add something like:

sysdeps/unix/sysv/linux/arc4random_impl.h


  static inline int getentropy_arch (void *p, size_t n)
  {
    for (;;)
      {
        ssize_t l = __getrandom_nocancel (p, n, 0);
        if (l > 0)
          {
            if (l == n)
             return true;
          }
        else if (l == 0)
          return -1;
        else if (l == -EINTR)
         continue;

  #if !__ASSUME_GETRANDOM
        if (l == -ENOSYS)
          return 0;
  #endif
        return -1;
      }
    return 1;
  }

And on stdlib/arc4random.c:

  void
  __arc4random_buf (void *p, size_t n)
  {
    if (n == 0)
      return;

    int s = getentropy_arch (p, n);
    if (s > 0)
      return;
    if (s < 0)
      arc4random_getrandom_failure ()

    /* Fallback.  */
  }

> 
>>> +      l = __getrandom_nocancel (p, n, 0);
>>
>> Do we need to worry about a potentially uncancellable blocking call here? I guess
>> using GRND_NONBLOCK does not really help.
> 
> No, generally not. Also, keep in mind that getrandom(0) will trigger
> jitter entropy if the kernel isn't already initialized.

Maybe add a comment stating it.

> 
>>
>>> +      if (l > 0)
>>> +	{
>>> +	  if ((size_t) l == n)
>>
>> Do we need the cast here?
> 
> Generally it's frowned upon to have implicit signed conversion, right? l
> is signed while n is unsigned.

Good question, I don't think we enforce it in fact.

> 
>>
>>> +	    return; /* Done reading, success. */
>>
>> Minor style issue: use double space before period.
> 
> I was really confused by this, and then opened up some other files and
> saw you meant *after* period. :) Will do for v4.

Yeah, I meant after indeed.

> 
>> As Florian said we will need a non cancellable poll here.  Since you are setting
>> the timeout as undefined, I think it would be simple to just add a non cancellable
>> wrapper as:
>>
>>   int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
>>   {
>>   #ifndef __NR_ppoll_time64
>>   # define __NR_ppoll_time64 __NR_ppoll
>>   #endif
>>      return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
>>   }
>>
>> So we don't need to handle the timeout for 64-bit time_t wrappers.
> 
> Oh that sounds like a good solution to the time64 situation. I'll do
> that for v4... BUT, I already implemented possibly the wrong solution
> for v3. Could you take a look at what I did there and confirm that it's
> wrong? If so, then I'll do exactly what you suggested here.
> 
> Thanks again for the review,
> Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 12:34           ` Adhemerval Zanella Netto
@ 2022-07-26 12:47             ` Jason A. Donenfeld
  2022-07-26 13:11               ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 12:47 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto

Hi Adhemerval,

On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote:
> kernel newer than 3.17) it means some syscall filtering, and I am not sure
> we should need to actually handle it.

One thing to keep in mind is that people who use CUSE-based /dev/urandom
implementations might not like this, as it means they'd also have to
intercept getrandom() rather than just ENOSYS'ing it. But maybe that's
fine. I don't know of anyone actually doing this in the real world at
the moment.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v2] arc4random: simplify design for better safety
  2022-07-26 12:47             ` Jason A. Donenfeld
@ 2022-07-26 13:11               ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 13:11 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	linux-crypto



On 26/07/22 09:47, Jason A. Donenfeld wrote:
> Hi Adhemerval,
> 
> On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote:
>> kernel newer than 3.17) it means some syscall filtering, and I am not sure
>> we should need to actually handle it.
> 
> One thing to keep in mind is that people who use CUSE-based /dev/urandom
> implementations might not like this, as it means they'd also have to
> intercept getrandom() rather than just ENOSYS'ing it. But maybe that's
> fine. I don't know of anyone actually doing this in the real world at
> the moment.
> 

I think it is a fair assumption that if you trying to implement your own
character device in userland, we should know the implications for the
environment.  From glibc standpoint, and I would for this whole thread,
we should assume that getrandom is de-facto API for entropy.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v4] arc4random: simplify design for better safety
  2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
  2022-07-25 23:11     ` Jason A. Donenfeld
  2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
@ 2022-07-26 13:30     ` Jason A. Donenfeld
  2022-07-26 15:21       ` Yann Droneaud
                         ` (2 more replies)
  2 siblings, 3 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 13:30 UTC (permalink / raw)
  To: libc-alpha
  Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers,
	linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, but it does to a decent swath of them, so generally
it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 NEWS                                          |   4 +-
 include/stdlib.h                              |   3 -
 manual/math.texi                              |  13 +-
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 205 ++-----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/tls-internal-struct.h         |   1 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/mach/hurd/kernel-features.h           |   1 +
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/Makefile              |   3 +-
 sysdeps/unix/sysv/linux/Versions              |   1 +
 sysdeps/unix/sysv/linux/kernel-features.h     |   7 +
 sysdeps/unix/sysv/linux/not-cancel.h          |   6 +
 .../sysv/linux/ppoll_nocancel.c}              |  19 +-
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 37 files changed, 89 insertions(+), 2658 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/ppoll_nocancel.c} (62%)
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/NEWS b/NEWS
index 8420a65cd0..fe531bfe1e 100644
--- a/NEWS
+++ b/NEWS
@@ -61,8 +61,8 @@ Major new features:
   is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
 
 * The functions arc4random, arc4random_buf, and arc4random_uniform have been
-  added.  The functions use a pseudo-random number generator along with
-  entropy from the kernel.
+  added.  The functions wrap getrandom and/or /dev/urandom to return high-
+  quality randomness from the kernel.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/manual/math.texi b/manual/math.texi
index 141695cc30..6d69bbff66 100644
--- a/manual/math.texi
+++ b/manual/math.texi
@@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
 extension, based on OpenBSD interfaces.
 
 @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
-or by reading @file{/dev/urandom} to seed and periodically re-seed the
-internal state.  A per-thread data pool is used, which allows fast output
-generation.
+or by reading @file{/dev/urandom} to seed.
 
-Although these functions provide higher random quality than ISO, BSD, and
-SVID functions, these still use a Pseudo-Random generator and should not
-be used in cryptographic contexts.
-
-The internal state is cleared and reseeded with kernel entropy on @code{fork}
-and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
-or when using @theglibc{} @code{syscall} function.
+These functions provide higher random quality than ISO, BSD, and SVID
+functions, and may be used in cryptographic contexts.
 
 The prototypes for these functions are in @file{stdlib.h}.
 @pindex stdlib.h
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..8fcf41e7de 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
@@ -24,53 +23,6 @@
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +30,72 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
+  static bool have_getrandom = true, seen_initialized = false;
+  int fd;
 
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
-
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
-	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
-	}
-      while (p < end);
+      ssize_t l;
 
-      if (__close_nocancel (fd) == 0)
-	return;
-    }
-  arc4random_getrandom_failure ();
-}
+      if (!__ASSUME_GETRANDOM && !atomic_load_relaxed (&have_getrandom))
+	break;
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
-    {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      l = __getrandom_nocancel (p, n, 0);
+      if (l > 0)
+	{
+	  if ((size_t) l == n)
+	    return; /* Done reading, success.  */
+	  p = (uint8_t *) p + l;
+	  n -= l;
+	  continue; /* Interrupted by a signal; keep going.  */
+	}
+      else if (l == 0)
+	arc4random_getrandom_failure (); /* Weird, should never happen.  */
+      else if (l == -EINTR)
+	continue; /* Interrupted by a signal; keep going.  */
+      else if (!__ASSUME_GETRANDOM && l == -ENOSYS)
+	{
+	  atomic_store_relaxed (&have_getrandom, false);
+	  break; /* No syscall, so fallback to /dev/urandom.  */
+	}
+      arc4random_getrandom_failure (); /* Weird, should never happen.  */
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
+  if (!atomic_load_relaxed (&seen_initialized))
     {
-      arc4random_getentropy (buffer, len);
-      return;
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (
+	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+	arc4random_getrandom_failure ();
+      if (TEMP_FAILURE_RETRY (__ppoll_infinity_nocancel (&pfd, 1)) < 0)
+	arc4random_getrandom_failure ();
+      if (__close_nocancel (pfd.fd) < 0)
+	arc4random_getrandom_failure ();
+      atomic_store_relaxed (&seen_initialized, true);
     }
 
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = TEMP_FAILURE_RETRY (
+      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  do
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+	arc4random_getrandom_failure ();
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  while (n);
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +104,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf (&r, sizeof (r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
index a91915831b..d76c715a96 100644
--- a/sysdeps/generic/tls-internal-struct.h
+++ b/sysdeps/generic/tls-internal-struct.h
@@ -23,7 +23,6 @@ struct tls_internal_t
 {
   char *strsignal_buf;
   char *strerror_l_buf;
-  struct arc4random_state_t *rand_state;
 };
 
 #endif
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/mach/hurd/kernel-features.h b/sysdeps/mach/hurd/kernel-features.h
index a7579f6d68..ce97627dc8 100644
--- a/sysdeps/mach/hurd/kernel-features.h
+++ b/sysdeps/mach/hurd/kernel-features.h
@@ -21,3 +21,4 @@
    But those referring to POSIX-level features like O_* flags can be.  */
 
 #define __ASSUME_CLOSE_RANGE 1
+#define __ASSUME_GETRANDOM 1
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 2ccc92b6b8..2f4f9784ee 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \
 		   open_nocancel open64_nocancel \
 		   openat_nocancel openat64_nocancel \
 		   read_nocancel pread64_nocancel \
-		   write_nocancel statx_cp stat_t64_cp
+		   write_nocancel statx_cp stat_t64_cp \
+		   ppoll_nocancel
 
 sysdep_headers += bits/fcntl-linux.h
 
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 65d2ceda2c..febe1ad421 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -320,6 +320,7 @@ libc {
     __read_nocancel;
     __pread64_nocancel;
     __close_nocancel;
+    __ppoll_infinity_nocancel;
     __sigtimedwait;
     # functions used by nscd
     __netlink_assert_response;
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 74adc3956b..75d5f953d4 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -236,4 +236,11 @@
 # define __ASSUME_FUTEX_LOCK_PI2 0
 #endif
 
+/* The getrandom() syscall was added in 3.17.  */
+#if __LINUX_KERNEL_VERSION >= 0x031100
+# define __ASSUME_GETRANDOM 1
+#else
+# define __ASSUME_GETRANDOM 0
+#endif
+
 #endif /* kernel-features.h */
diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
index 2c58d5ae2f..d3df8fa79e 100644
--- a/sysdeps/unix/sysv/linux/not-cancel.h
+++ b/sysdeps/unix/sysv/linux/not-cancel.h
@@ -23,6 +23,7 @@
 #include <sysdep.h>
 #include <errno.h>
 #include <unistd.h>
+#include <sys/poll.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -77,6 +78,10 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
 /* Uncancelable fcntl.  */
 __typeof (__fcntl) __fcntl64_nocancel;
 
+/* Uncancelable ppoll.  */
+int
+__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds);
+
 #if IS_IN (libc) || IS_IN (rtld)
 hidden_proto (__open_nocancel)
 hidden_proto (__open64_nocancel)
@@ -87,6 +92,7 @@ hidden_proto (__pread64_nocancel)
 hidden_proto (__write_nocancel)
 hidden_proto (__close_nocancel)
 hidden_proto (__fcntl64_nocancel)
+hidden_proto (__ppoll_infinity_nocancel)
 #endif
 
 #endif /* NOT_CANCEL_H  */
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/ppoll_nocancel.c
similarity index 62%
rename from sysdeps/generic/chacha20_arch.h
rename to sysdeps/unix/sysv/linux/ppoll_nocancel.c
index 1b4559ccbc..28c8761566 100644
--- a/sysdeps/generic/chacha20_arch.h
+++ b/sysdeps/unix/sysv/linux/ppoll_nocancel.c
@@ -1,5 +1,5 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
+/* Linux ppoll syscall implementation -- non-cancellable.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -16,9 +16,16 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
+#include <unistd.h>
+#include <sysdep-cancel.h>
+#include <not-cancel.h>
+
+int
+__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
 {
-  chacha20_crypt_generic (state, dst, src, bytes);
+#ifndef __NR_ppoll_time64
+# define __NR_ppoll_time64 __NR_ppoll
+#endif
+  return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0);
 }
+hidden_def (__ppoll_infinity_nocancel)
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
index ebc65d896a..2ebe977802 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.h
+++ b/sysdeps/unix/sysv/linux/tls-internal.h
@@ -28,7 +28,6 @@ __glibc_tls_internal (void)
   return &THREAD_SELF->tls_state;
 }
 
-/* Reset the arc4random TCB state on fork.  */
 extern void __glibc_tls_internal_free (void) attribute_hidden;
 
 #endif
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4] arc4random: simplify design for better safety
  2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
@ 2022-07-26 15:21       ` Yann Droneaud
  2022-07-26 16:20       ` Adhemerval Zanella Netto
  2022-07-26 19:08       ` [PATCH v5] " Jason A. Donenfeld
  2 siblings, 0 replies; 81+ messages in thread
From: Yann Droneaud @ 2022-07-26 15:21 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha; +Cc: Florian Weimer, Eric Biggers, linux-crypto

Hi,

Le 26/07/2022 à 15:30, Jason A. Donenfeld via Libc-alpha a écrit :
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.


I dislike the wording because

1) the current buffer is only 512 bytes, not 16MiBytes;
2) implementation reads only 48 bytes of "fresh" entropy from 
getrandom() each 16MiBytes generated.

I'm thinking "stirring" or "streaming" would better describe what's 
happening:

"Rather than stirring 16MiB of random data in userspace before reseeding"


> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
>
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together.


I believe the cloned virtual machine issue should be explicitly 
described as a major blocker in the commit message.


> Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
>
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
>
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, and it may not port back to all
> ancient kernels, but it does to a decent swath of them, so generally
> it's the best approximation we can do.
>
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.


Sure arc4random() interface is inherited from *BSD, thus we're not free 
to improve it. But arc4random() is already here in glibc git, thus I 
think the paragraph is of dubious value in the commit message and can be 
removed.


> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Mark Harris <mark.hsj@gmail.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>   LICENSES                                      |  23 -
>   NEWS                                          |   4 +-
>   include/stdlib.h                              |   3 -
>   manual/math.texi                              |  13 +-
>   stdlib/Makefile                               |   2 -
>   stdlib/arc4random.c                           | 205 ++-----
>   stdlib/arc4random.h                           |  48 --
>   stdlib/chacha20.c                             | 191 ------
>   stdlib/tst-arc4random-chacha20.c              | 167 -----
>   sysdeps/aarch64/Makefile                      |   4 -
>   sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
>   sysdeps/aarch64/chacha20_arch.h               |  40 --
>   sysdeps/generic/tls-internal-struct.h         |   1 -
>   sysdeps/generic/tls-internal.c                |  10 -
>   sysdeps/mach/hurd/_Fork.c                     |   2 -
>   sysdeps/mach/hurd/kernel-features.h           |   1 +
>   sysdeps/nptl/_Fork.c                          |   2 -
>   .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
>   .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
>   .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
>   sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
>   .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
>   .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
>   sysdeps/s390/s390-64/Makefile                 |   6 -
>   sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
>   sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
>   sysdeps/unix/sysv/linux/Makefile              |   3 +-
>   sysdeps/unix/sysv/linux/Versions              |   1 +
>   sysdeps/unix/sysv/linux/kernel-features.h     |   7 +
>   sysdeps/unix/sysv/linux/not-cancel.h          |   6 +
>   .../sysv/linux/ppoll_nocancel.c}              |  19 +-
>   sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
>   sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
>   sysdeps/x86_64/Makefile                       |   7 -
>   sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
>   sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
>   sysdeps/x86_64/chacha20_arch.h                |  55 --
>   37 files changed, 89 insertions(+), 2658 deletions(-)
>   delete mode 100644 stdlib/arc4random.h
>   delete mode 100644 stdlib/chacha20.c
>   delete mode 100644 stdlib/tst-arc4random-chacha20.c
>   delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
>   delete mode 100644 sysdeps/aarch64/chacha20_arch.h
>   delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
>   delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
>   delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
>   delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
>   delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
>   delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
>   delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
>   rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/ppoll_nocancel.c} (62%)
>   delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
>   delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
>   delete mode 100644 sysdeps/x86_64/chacha20_arch.h
>
> diff --git a/manual/math.texi b/manual/math.texi
> index 141695cc30..6d69bbff66 100644
> --- a/manual/math.texi
> +++ b/manual/math.texi
> @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
>   extension, based on OpenBSD interfaces.
>   
>   @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
> -or by reading @file{/dev/urandom} to seed and periodically re-seed the
> -internal state.  A per-thread data pool is used, which allows fast output
> -generation.
> +or by reading @file{/dev/urandom} to seed.
>   
> -Although these functions provide higher random quality than ISO, BSD, and
> -SVID functions, these still use a Pseudo-Random generator and should not
> -be used in cryptographic contexts.
> -
> -The internal state is cleared and reseeded with kernel entropy on @code{fork}
> -and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
> -or when using @theglibc{} @code{syscall} function.
> +These functions provide higher random quality than ISO, BSD, and SVID
> +functions, and may be used in cryptographic contexts.

+ "provided getrandom() and /dev/urandom() could be used in such 
context." ;)


Thanks for the improvements, can't wait for a vDSO getrandom() optimized 
for reading 1,2,4,8 bytes :)


Regards.


-- 

Yann Droneaud

OPTEYA



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4] arc4random: simplify design for better safety
  2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
  2022-07-26 15:21       ` Yann Droneaud
@ 2022-07-26 16:20       ` Adhemerval Zanella Netto
  2022-07-26 18:36         ` Jason A. Donenfeld
  2022-07-26 19:08       ` [PATCH v5] " Jason A. Donenfeld
  2 siblings, 1 reply; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 16:20 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha
  Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert,
	Mark Harris, Eric Biggers, linux-crypto



On 26/07/22 10:30, Jason A. Donenfeld wrote:

> +      l = __getrandom_nocancel (p, n, 0);
> +      if (l > 0)
> +	{
> +	  if ((size_t) l == n)
> +	    return; /* Done reading, success.  */
> +	  p = (uint8_t *) p + l;
> +	  n -= l;
> +	  continue; /* Interrupted by a signal; keep going.  */
> +	}
> +      else if (l == 0)
> +	arc4random_getrandom_failure (); /* Weird, should never happen.  */
> +      else if (l == -EINTR)
> +	continue; /* Interrupted by a signal; keep going.  */
> +      else if (!__ASSUME_GETRANDOM && l == -ENOSYS)
> +	{
> +	  atomic_store_relaxed (&have_getrandom, false);

I still think there is no much gain in this optimization, the syscall will
most likely be present and it is one less static data.  Also, we avoid to
use __ASSUME_GETRANDOM on generic code (all __ASSUME usage within
sysdeps and/or nptl).

> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 2ccc92b6b8..2f4f9784ee 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \
>  		   open_nocancel open64_nocancel \
>  		   openat_nocancel openat64_nocancel \
>  		   read_nocancel pread64_nocancel \
> -		   write_nocancel statx_cp stat_t64_cp
> +		   write_nocancel statx_cp stat_t64_cp \
> +		   ppoll_nocancel
>  
>  sysdep_headers += bits/fcntl-linux.h
>  
> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index 65d2ceda2c..febe1ad421 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -320,6 +320,7 @@ libc {
>      __read_nocancel;
>      __pread64_nocancel;
>      __close_nocancel;
> +    __ppoll_infinity_nocancel;
>      __sigtimedwait;
>      # functions used by nscd
>      __netlink_assert_response;

There is no need to export on GLIBC_PRIVATE, since it is not currently usage
libc.so.  Just define is a hidden (attribute_hidden).

> diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
> index 74adc3956b..75d5f953d4 100644
> --- a/sysdeps/unix/sysv/linux/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/kernel-features.h
> @@ -236,4 +236,11 @@
>  # define __ASSUME_FUTEX_LOCK_PI2 0
>  #endif
>  
> +/* The getrandom() syscall was added in 3.17.  */
> +#if __LINUX_KERNEL_VERSION >= 0x031100
> +# define __ASSUME_GETRANDOM 1
> +#else
> +# define __ASSUME_GETRANDOM 0
> +#endif
> +
>  #endif /* kernel-features.h */
> diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
> index 2c58d5ae2f..d3df8fa79e 100644
> --- a/sysdeps/unix/sysv/linux/not-cancel.h
> +++ b/sysdeps/unix/sysv/linux/not-cancel.h
> @@ -23,6 +23,7 @@
>  #include <sysdep.h>
>  #include <errno.h>
>  #include <unistd.h>
> +#include <sys/poll.h>
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <time.h>
> @@ -77,6 +78,10 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
>  /* Uncancelable fcntl.  */
>  __typeof (__fcntl) __fcntl64_nocancel;
>  
> +/* Uncancelable ppoll.  */
> +int
> +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds);

Use attribute_hidden here and remove it from sysdeps/unix/sysv/linux/Versions.

> +
>  #if IS_IN (libc) || IS_IN (rtld)
>  hidden_proto (__open_nocancel)
>  hidden_proto (__open64_nocancel)
> @@ -87,6 +92,7 @@ hidden_proto (__pread64_nocancel)
>  hidden_proto (__write_nocancel)
>  hidden_proto (__close_nocancel)
>  hidden_proto (__fcntl64_nocancel)
> +hidden_proto (__ppoll_infinity_nocancel)
>  #endif
>  
>  #endif /* NOT_CANCEL_H  */

Also update the hurd sysdeps/mach/hurd/not-cancel.h with a wrapper to 
__poll (since it does not really support pthread cancellation).


> diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/ppoll_nocancel.c
> similarity index 62%
> rename from sysdeps/generic/chacha20_arch.h
> rename to sysdeps/unix/sysv/linux/ppoll_nocancel.c
> index 1b4559ccbc..28c8761566 100644
> --- a/sysdeps/generic/chacha20_arch.h
> +++ b/sysdeps/unix/sysv/linux/ppoll_nocancel.c
> @@ -1,5 +1,5 @@
> -/* Chacha20 implementation, generic interface for encrypt.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> +/* Linux ppoll syscall implementation -- non-cancellable.
> +   Copyright (C) 2018-2022 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
>     The GNU C Library is free software; you can redistribute it and/or
> @@ -16,9 +16,16 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> +#include <unistd.h>
> +#include <sysdep-cancel.h>
> +#include <not-cancel.h>
> +
> +int
> +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
>  {
> -  chacha20_crypt_generic (state, dst, src, bytes);
> +#ifndef __NR_ppoll_time64
> +# define __NR_ppoll_time64 __NR_ppoll
> +#endif
> +  return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0);
>  }
> +hidden_def (__ppoll_infinity_nocancel)

Maybe just add an inline wrapper on sysdeps/unix/sysv/linux/not-cancel.h, 
as for __getrandom_nocancel:

  static inline int
  __ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
  {
  #ifndef __NR_ppoll_time64
  # define __NR_ppoll_time64 __NR_ppoll
  #endif
    return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0);
  }

It avoids a lot of boilerplate code to add the internal symbol.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v4] arc4random: simplify design for better safety
  2022-07-26 16:20       ` Adhemerval Zanella Netto
@ 2022-07-26 18:36         ` Jason A. Donenfeld
  0 siblings, 0 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 18:36 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert,
	Mark Harris, Eric Biggers, linux-crypto

Hi Adhemerval,

On Tue, Jul 26, 2022 at 01:20:11PM -0300, Adhemerval Zanella Netto wrote:
> > +	{
> > +	  atomic_store_relaxed (&have_getrandom, false);
> 
> I still think there is no much gain in this optimization, the syscall will
> most likely be present and it is one less static data.  Also, we avoid to
> use __ASSUME_GETRANDOM on generic code (all __ASSUME usage within
> sysdeps and/or nptl).

Oh! *That's* what you were talking about before. Sorry I didn't catch
your meaning the first time through.

Okay so you're alright having +1 syscall overhead on old systems, so
that new systems can have a byte less of static data. I don't hold any
opinions either way there and will defer to your expertise, so I'll get
rid of this part on v5.

> > +    __ppoll_infinity_nocancel;
> >      __sigtimedwait;
> >      # functions used by nscd
> >      __netlink_assert_response;
> 
> There is no need to export on GLIBC_PRIVATE, since it is not currently usage
> libc.so.  Just define is a hidden (attribute_hidden).
> Use attribute_hidden here and remove it from sysdeps/unix/sysv/linux/Versions.
>> Maybe just add an inline wrapper on sysdeps/unix/sysv/linux/not-cancel.h, 
> as for __getrandom_nocancel:
> It avoids a lot of boilerplate code to add the internal symbol.

Okay I'll skip all the symbol stuff and just do the static inline like
getrandom has. Thanks for the suggestion; that's a lot simpler.

> Also update the hurd sysdeps/mach/hurd/not-cancel.h with a wrapper to 
> __poll (since it does not really support pthread cancellation).

Ack.

Thanks for the comments. v5 coming up shortly.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v5] arc4random: simplify design for better safety
  2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
  2022-07-26 15:21       ` Yann Droneaud
  2022-07-26 16:20       ` Adhemerval Zanella Netto
@ 2022-07-26 19:08       ` Jason A. Donenfeld
  2022-07-26 19:58         ` [PATCH v6] " Jason A. Donenfeld
  2 siblings, 1 reply; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 19:08 UTC (permalink / raw)
  To: libc-alpha
  Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer,
	Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers,
	linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, but it does to a decent swath of them, so generally
it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 NEWS                                          |   4 +-
 include/stdlib.h                              |   3 -
 manual/math.texi                              |  13 +-
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 195 ++----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/chacha20_arch.h               |  24 -
 sysdeps/generic/tls-internal-struct.h         |   1 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/mach/hurd/kernel-features.h           |   1 +
 sysdeps/mach/hurd/not-cancel.h                |   3 +
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/kernel-features.h     |   7 +
 sysdeps/unix/sysv/linux/not-cancel.h          |  11 +-
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 36 files changed, 70 insertions(+), 2676 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/generic/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/NEWS b/NEWS
index 8420a65cd0..fe531bfe1e 100644
--- a/NEWS
+++ b/NEWS
@@ -61,8 +61,8 @@ Major new features:
   is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
 
 * The functions arc4random, arc4random_buf, and arc4random_uniform have been
-  added.  The functions use a pseudo-random number generator along with
-  entropy from the kernel.
+  added.  The functions wrap getrandom and/or /dev/urandom to return high-
+  quality randomness from the kernel.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/manual/math.texi b/manual/math.texi
index 141695cc30..6d69bbff66 100644
--- a/manual/math.texi
+++ b/manual/math.texi
@@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
 extension, based on OpenBSD interfaces.
 
 @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
-or by reading @file{/dev/urandom} to seed and periodically re-seed the
-internal state.  A per-thread data pool is used, which allows fast output
-generation.
+or by reading @file{/dev/urandom} to seed.
 
-Although these functions provide higher random quality than ISO, BSD, and
-SVID functions, these still use a Pseudo-Random generator and should not
-be used in cryptographic contexts.
-
-The internal state is cleared and reseeded with kernel entropy on @code{fork}
-and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
-or when using @theglibc{} @code{syscall} function.
+These functions provide higher random quality than ISO, BSD, and SVID
+functions, and may be used in cryptographic contexts.
 
 The prototypes for these functions are in @file{stdlib.h}.
 @pindex stdlib.h
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..e819af0c99 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
@@ -24,53 +23,6 @@
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +30,62 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
-
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
+  static bool seen_initialized = false;
+  size_t l;
+  int fd;
 
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
+      l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0));
+      if (l > 0)
 	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
+	  if ((size_t) l == n)
+	    return; /* Done reading, success.  */
+	  p = (uint8_t *) p + l;
+	  n -= l;
+	  continue; /* Interrupted by a signal; keep going.  */
 	}
-      while (p < end);
-
-      if (__close_nocancel (fd) == 0)
-	return;
+      else if (!__ASSUME_GETRANDOM && l < 0 && errno == ENOSYS)
+	break; /* No syscall, so fallback to /dev/urandom.  */
+      arc4random_getrandom_failure ();
     }
-  arc4random_getrandom_failure ();
-}
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
+  if (!atomic_load_relaxed (&seen_initialized))
     {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (
+	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+	arc4random_getrandom_failure ();
+      if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0)
+	arc4random_getrandom_failure ();
+      if (__close_nocancel (pfd.fd) < 0)
+	arc4random_getrandom_failure ();
+      atomic_store_relaxed (&seen_initialized, true);
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (buffer, len);
-      return;
-    }
-
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = TEMP_FAILURE_RETRY (
+      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  for (;;)
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+	arc4random_getrandom_failure ();
+      if ((size_t) l == n)
+	break; /* Done reading, success.  */
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +94,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf (&r, sizeof (r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
index a91915831b..d76c715a96 100644
--- a/sysdeps/generic/tls-internal-struct.h
+++ b/sysdeps/generic/tls-internal-struct.h
@@ -23,7 +23,6 @@ struct tls_internal_t
 {
   char *strsignal_buf;
   char *strerror_l_buf;
-  struct arc4random_state_t *rand_state;
 };
 
 #endif
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/mach/hurd/kernel-features.h b/sysdeps/mach/hurd/kernel-features.h
index a7579f6d68..ce97627dc8 100644
--- a/sysdeps/mach/hurd/kernel-features.h
+++ b/sysdeps/mach/hurd/kernel-features.h
@@ -21,3 +21,4 @@
    But those referring to POSIX-level features like O_* flags can be.  */
 
 #define __ASSUME_CLOSE_RANGE 1
+#define __ASSUME_GETRANDOM 1
diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
index 9a3a7ed59a..af5eff3559 100644
--- a/sysdeps/mach/hurd/not-cancel.h
+++ b/sysdeps/mach/hurd/not-cancel.h
@@ -77,6 +77,9 @@ __typeof (__fcntl) __fcntl_nocancel;
 #define __getrandom_nocancel(buf, size, flags) \
   __getrandom (buf, size, flags)
 
+#define __poll_infinity_nocancel(fds, nfds) \
+  __poll (fds, nfds, -1)
+
 #if IS_IN (libc)
 hidden_proto (__close_nocancel)
 hidden_proto (__close_nocancel_nostatus)
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 74adc3956b..75d5f953d4 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -236,4 +236,11 @@
 # define __ASSUME_FUTEX_LOCK_PI2 0
 #endif
 
+/* The getrandom() syscall was added in 3.17.  */
+#if __LINUX_KERNEL_VERSION >= 0x031100
+# define __ASSUME_GETRANDOM 1
+#else
+# define __ASSUME_GETRANDOM 0
+#endif
+
 #endif /* kernel-features.h */
diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
index 2c58d5ae2f..4fcdf08c9a 100644
--- a/sysdeps/unix/sysv/linux/not-cancel.h
+++ b/sysdeps/unix/sysv/linux/not-cancel.h
@@ -23,6 +23,7 @@
 #include <sysdep.h>
 #include <errno.h>
 #include <unistd.h>
+#include <sys/poll.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -70,9 +71,17 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
 static inline int
 __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
 {
-  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
+  return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags);
 }
 
+static inline int
+__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
+{
+#ifndef __NR_ppoll_time64
+# define __NR_ppoll_time64 __NR_ppoll
+#endif
+  return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0);
+}
 
 /* Uncancelable fcntl.  */
 __typeof (__fcntl) __fcntl64_nocancel;
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
index ebc65d896a..2ebe977802 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.h
+++ b/sysdeps/unix/sysv/linux/tls-internal.h
@@ -28,7 +28,6 @@ __glibc_tls_internal (void)
   return &THREAD_SELF->tls_state;
 }
 
-/* Reset the arc4random TCB state on fork.  */
 extern void __glibc_tls_internal_free (void) attribute_hidden;
 
 #endif
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v6] arc4random: simplify design for better safety
  2022-07-26 19:08       ` [PATCH v5] " Jason A. Donenfeld
@ 2022-07-26 19:58         ` Jason A. Donenfeld
  2022-07-26 20:17           ` Adhemerval Zanella Netto
  2022-07-28 10:29           ` Szabolcs Nagy
  0 siblings, 2 replies; 81+ messages in thread
From: Jason A. Donenfeld @ 2022-07-26 19:58 UTC (permalink / raw)
  To: libc-alpha, adhemerval.zanella
  Cc: Jason A. Donenfeld, Florian Weimer, Cristian Rodríguez,
	Paul Eggert, Mark Harris, Eric Biggers, linux-crypto

Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 LICENSES                                      |  23 -
 NEWS                                          |   4 +-
 include/stdlib.h                              |   3 -
 manual/math.texi                              |  13 +-
 stdlib/Makefile                               |   2 -
 stdlib/arc4random.c                           | 196 ++----
 stdlib/arc4random.h                           |  48 --
 stdlib/chacha20.c                             | 191 ------
 stdlib/tst-arc4random-chacha20.c              | 167 -----
 sysdeps/aarch64/Makefile                      |   4 -
 sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
 sysdeps/aarch64/chacha20_arch.h               |  40 --
 sysdeps/generic/chacha20_arch.h               |  24 -
 sysdeps/generic/not-cancel.h                  |   3 +
 sysdeps/generic/tls-internal-struct.h         |   1 -
 sysdeps/generic/tls-internal.c                |  10 -
 sysdeps/mach/hurd/_Fork.c                     |   2 -
 sysdeps/mach/hurd/not-cancel.h                |   4 +
 sysdeps/nptl/_Fork.c                          |   2 -
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
 sysdeps/s390/s390-64/Makefile                 |   6 -
 sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
 sysdeps/unix/sysv/linux/not-cancel.h          |   8 +-
 sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
 sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
 sysdeps/x86_64/Makefile                       |   7 -
 sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
 sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
 sysdeps/x86_64/chacha20_arch.h                |  55 --
 35 files changed, 64 insertions(+), 2676 deletions(-)
 delete mode 100644 stdlib/arc4random.h
 delete mode 100644 stdlib/chacha20.c
 delete mode 100644 stdlib/tst-arc4random-chacha20.c
 delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
 delete mode 100644 sysdeps/aarch64/chacha20_arch.h
 delete mode 100644 sysdeps/generic/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
 delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
 delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
 delete mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index cd04fb6e84..530893b1dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
-\f
-sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
-sysdeps/x86_64/chacha20-amd64-avx2.S, and
-sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
-sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
-with the following notices:
-
-Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-This file is part of Libgcrypt.
-
-Libgcrypt is free software; you can redistribute it and/or modify
-it under the terms of the GNU Lesser General Public License as
-published by the Free Software Foundation; either version 2.1 of
-the License, or (at your option) any later version.
-
-Libgcrypt is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this program; if not, see <https://www.gnu.org/licenses/>.
diff --git a/NEWS b/NEWS
index 8420a65cd0..fe531bfe1e 100644
--- a/NEWS
+++ b/NEWS
@@ -61,8 +61,8 @@ Major new features:
   is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
 
 * The functions arc4random, arc4random_buf, and arc4random_uniform have been
-  added.  The functions use a pseudo-random number generator along with
-  entropy from the kernel.
+  added.  The functions wrap getrandom and/or /dev/urandom to return high-
+  quality randomness from the kernel.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/include/stdlib.h b/include/stdlib.h
index cae7f7cdf8..db51f4a4f6 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
 libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
      attribute_hidden;
-/* Called from the fork function to reinitialize the internal cipher state
-   in child process.  */
-extern void __arc4random_fork_subprocess (void) attribute_hidden;
 
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
diff --git a/manual/math.texi b/manual/math.texi
index 141695cc30..6d69bbff66 100644
--- a/manual/math.texi
+++ b/manual/math.texi
@@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
 extension, based on OpenBSD interfaces.
 
 @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
-or by reading @file{/dev/urandom} to seed and periodically re-seed the
-internal state.  A per-thread data pool is used, which allows fast output
-generation.
+or by reading @file{/dev/urandom} to seed.
 
-Although these functions provide higher random quality than ISO, BSD, and
-SVID functions, these still use a Pseudo-Random generator and should not
-be used in cryptographic contexts.
-
-The internal state is cleared and reseeded with kernel entropy on @code{fork}
-and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
-or when using @theglibc{} @code{syscall} function.
+These functions provide higher random quality than ISO, BSD, and SVID
+functions, and may be used in cryptographic contexts.
 
 The prototypes for these functions are in @file{stdlib.h}.
 @pindex stdlib.h
diff --git a/stdlib/Makefile b/stdlib/Makefile
index a900962685..f7b25c1981 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -246,7 +246,6 @@ tests := \
   # tests
 
 tests-internal := \
-  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -256,7 +255,6 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
-  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index 65547e79aa..0cb9991328 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -1,4 +1,4 @@
-/* Pseudo Random Number Generator based on ChaCha20.
+/* Pseudo Random Number Generator
    Copyright (C) 2022 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <arc4random.h>
 #include <errno.h>
 #include <not-cancel.h>
 #include <stdio.h>
@@ -24,53 +23,6 @@
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
-#include <tls-internal.h>
-
-/* arc4random keeps two counters: 'have' is the current valid bytes not yet
-   consumed in 'buf' while 'count' is the maximum number of bytes until a
-   reseed.
-
-   Both the initial seed and reseed try to obtain entropy from the kernel
-   and abort the process if none could be obtained.
-
-   The state 'buf' improves the usage of the cipher calls, allowing to call
-   optimized implementations (if the architecture provides it) and minimize
-   function call overhead.  */
-
-#include <chacha20.c>
-
-/* Called from the fork function to reset the state.  */
-void
-__arc4random_fork_subprocess (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state != NULL)
-    {
-      explicit_bzero (state, sizeof (*state));
-      /* Force key init.  */
-      state->count = -1;
-    }
-}
-
-/* Return the current thread random state or try to create one if there is
-   none available.  In the case malloc can not allocate a state, arc4random
-   will try to get entropy with arc4random_getentropy.  */
-static struct arc4random_state_t *
-arc4random_get_state (void)
-{
-  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
-  if (state == NULL)
-    {
-      state = malloc (sizeof (struct arc4random_state_t));
-      if (state != NULL)
-	{
-	  /* Force key initialization on first call.  */
-	  state->count = -1;
-	  __glibc_tls_internal ()->rand_state = state;
-	}
-    }
-  return state;
-}
 
 static void
 arc4random_getrandom_failure (void)
@@ -78,106 +30,63 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
 }
 
-static void
-arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
+void
+__arc4random_buf (void *p, size_t n)
 {
-  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
-
-  /* Mix optional user provided data.  */
-  if (rnd != NULL)
-    {
-      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-      for (size_t i = 0; i < m; i++)
-	state->buf[i] ^= rnd[i];
-    }
-
-  /* Immediately reinit for backtracking resistance.  */
-  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
-  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
-}
+  static int seen_initialized;
+  size_t l;
+  int fd;
 
-static void
-arc4random_getentropy (void *rnd, size_t len)
-{
-  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
+  if (n == 0)
     return;
 
-  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
-						  O_RDONLY | O_CLOEXEC));
-  if (fd != -1)
+  for (;;)
     {
-      uint8_t *p = rnd;
-      uint8_t *end = p + len;
-      do
+      l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0));
+      if (l > 0)
 	{
-	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
-	  if (ret <= 0)
-	    arc4random_getrandom_failure ();
-	  p += ret;
+	  if ((size_t) l == n)
+	    return; /* Done reading, success.  */
+	  p = (uint8_t *) p + l;
+	  n -= l;
+	  continue; /* Interrupted by a signal; keep going.  */
 	}
-      while (p < end);
-
-      if (__close_nocancel (fd) == 0)
-	return;
+      else if (l < 0 && errno == ENOSYS)
+	break; /* No syscall, so fallback to /dev/urandom.  */
+      arc4random_getrandom_failure ();
     }
-  arc4random_getrandom_failure ();
-}
 
-/* Check if the thread context STATE should be reseed with kernel entropy
-   depending of requested LEN bytes.  If there is less than requested,
-   the state is either initialized or reseeded, otherwise the internal
-   counter subtract the requested length.  */
-static void
-arc4random_check_stir (struct arc4random_state_t *state, size_t len)
-{
-  if (state->count <= len || state->count == -1)
+  if (!atomic_load_relaxed (&seen_initialized))
     {
-      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-      arc4random_getentropy (rnd, sizeof rnd);
-
-      if (state->count == -1)
-	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
-      else
-	arc4random_rekey (state, rnd, sizeof rnd);
-
-      explicit_bzero (rnd, sizeof rnd);
-
-      /* Invalidate the buf.  */
-      state->have = 0;
-      memset (state->buf, 0, sizeof state->buf);
-      state->count = CHACHA20_RESEED_SIZE;
+      /* Poll /dev/random as an approximation of RNG initialization.  */
+      struct pollfd pfd = { .events = POLLIN };
+      pfd.fd = TEMP_FAILURE_RETRY (
+	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+      if (pfd.fd < 0)
+	arc4random_getrandom_failure ();
+      if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0)
+	arc4random_getrandom_failure ();
+      if (__close_nocancel (pfd.fd) < 0)
+	arc4random_getrandom_failure ();
+      atomic_store_relaxed (&seen_initialized, 1);
     }
-  else
-    state->count -= len;
-}
 
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (buffer, len);
-      return;
-    }
-
-  arc4random_check_stir (state, len);
-  while (len > 0)
+  fd = TEMP_FAILURE_RETRY (
+      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
+  if (fd < 0)
+    arc4random_getrandom_failure ();
+  for (;;)
     {
-      if (state->have > 0)
-	{
-	  size_t m = MIN (len, state->have);
-	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-	  memcpy (buffer, ks, m);
-	  explicit_bzero (ks, m);
-	  buffer += m;
-	  len -= m;
-	  state->have -= m;
-	}
-      if (state->have == 0)
-	arc4random_rekey (state, NULL, 0);
+      l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
+      if (l <= 0)
+	arc4random_getrandom_failure ();
+      if ((size_t) l == n)
+	break; /* Done reading, success.  */
+      p = (uint8_t *) p + l;
+      n -= l;
     }
+  if (__close_nocancel (fd) < 0)
+    arc4random_getrandom_failure ();
 }
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
@@ -186,22 +95,7 @@ uint32_t
 __arc4random (void)
 {
   uint32_t r;
-
-  struct arc4random_state_t *state = arc4random_get_state ();
-  if (__glibc_unlikely (state == NULL))
-    {
-      arc4random_getentropy (&r, sizeof (uint32_t));
-      return r;
-    }
-
-  arc4random_check_stir (state, sizeof (uint32_t));
-  if (state->have < sizeof (uint32_t))
-    arc4random_rekey (state, NULL, 0);
-  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
-  memcpy (&r, ks, sizeof (uint32_t));
-  memset (ks, 0, sizeof (uint32_t));
-  state->have -= sizeof (uint32_t);
-
+  __arc4random_buf (&r, sizeof (r));
   return r;
 }
 libc_hidden_def (__arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
deleted file mode 100644
index cd39389c19..0000000000
--- a/stdlib/arc4random.h
+++ /dev/null
@@ -1,48 +0,0 @@
-/* Arc4random definition used on TLS.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#ifndef _CHACHA20_H
-#define _CHACHA20_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Internal ChaCha20 state.  */
-#define CHACHA20_STATE_LEN	16
-#define CHACHA20_BLOCK_SIZE	64
-
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
-
-/* Internal arc4random buffer, used on each feedback step so offer some
-   backtracking protection and to allow better used of vectorized
-   chacha20 implementations.  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
-_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
-		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
-
-struct arc4random_state_t
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-};
-
-#endif
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
deleted file mode 100644
index 2745a81315..0000000000
--- a/stdlib/chacha20.c
+++ /dev/null
@@ -1,191 +0,0 @@
-/* Generic ChaCha20 implementation (used on arc4random).
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <array_length.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <string.h>
-
-/* 32-bit stream position, then 96-bit nonce.  */
-#define CHACHA20_IV_SIZE	16
-#define CHACHA20_KEY_SIZE	32
-
-#define CHACHA20_STATE_LEN	16
-
-/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
-   XOR of the keystream with the plaintext because the plaintext is a
-   stream of zeros.  */
-
-enum chacha20_constants
-{
-  CHACHA20_CONSTANT_EXPA = 0x61707865U,
-  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
-  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
-  CHACHA20_CONSTANT_TE_K = 0x6b206574U
-};
-
-static inline uint32_t
-read_unaligned_32 (const uint8_t *p)
-{
-  uint32_t r;
-  memcpy (&r, p, sizeof (r));
-  return r;
-}
-
-static inline void
-write_unaligned_32 (uint8_t *p, uint32_t v)
-{
-  memcpy (p, &v, sizeof (v));
-}
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
-# define set_state(v)		__builtin_bswap32 ((v))
-#else
-# define read_unaligned_le32(p) read_unaligned_32 ((p))
-# define set_state(v)		(v)
-#endif
-
-static inline void
-chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
-{
-  state[0]  = CHACHA20_CONSTANT_EXPA;
-  state[1]  = CHACHA20_CONSTANT_ND_3;
-  state[2]  = CHACHA20_CONSTANT_2_BY;
-  state[3]  = CHACHA20_CONSTANT_TE_K;
-
-  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
-  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
-  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
-  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
-  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
-  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
-  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
-  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
-
-  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
-  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
-  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
-  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
-}
-
-static inline uint32_t
-rotl32 (unsigned int shift, uint32_t word)
-{
-  return (word << (shift & 31)) | (word >> ((-shift) & 31));
-}
-
-static void
-state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
-{
-#ifdef CHACHA20_XOR_FINAL
-  v ^= read_unaligned_32 (src);
-#endif
-  write_unaligned_32 (dst, v);
-}
-
-static inline void
-chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
-{
-  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
-  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
-
-  x0 = state[0];
-  x1 = state[1];
-  x2 = state[2];
-  x3 = state[3];
-  x4 = state[4];
-  x5 = state[5];
-  x6 = state[6];
-  x7 = state[7];
-  x8 = state[8];
-  x9 = state[9];
-  x10 = state[10];
-  x11 = state[11];
-  x12 = state[12];
-  x13 = state[13];
-  x14 = state[14];
-  x15 = state[15];
-
-  for (int i = 0; i < 20; i += 2)
-    {
-#define QROUND(_x0, _x1, _x2, _x3) 			\
-  do {							\
-   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
-   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
-   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
-  } while(0)
-
-      QROUND (x0, x4, x8,  x12);
-      QROUND (x1, x5, x9,  x13);
-      QROUND (x2, x6, x10, x14);
-      QROUND (x3, x7, x11, x15);
-
-      QROUND (x0, x5, x10, x15);
-      QROUND (x1, x6, x11, x12);
-      QROUND (x2, x7, x8,  x13);
-      QROUND (x3, x4, x9,  x14);
-    }
-
-  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
-  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
-  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
-  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
-  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
-  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
-  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
-  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
-  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
-  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
-  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
-  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
-  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
-  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
-  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
-  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
-
-  state[12]++;
-}
-
-static void
-__attribute_maybe_unused__
-chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			size_t bytes)
-{
-  while (bytes >= CHACHA20_BLOCK_SIZE)
-    {
-      chacha20_block (state, dst, src);
-
-      bytes -= CHACHA20_BLOCK_SIZE;
-      dst += CHACHA20_BLOCK_SIZE;
-      src += CHACHA20_BLOCK_SIZE;
-    }
-
-  if (__glibc_unlikely (bytes != 0))
-    {
-      uint8_t stream[CHACHA20_BLOCK_SIZE];
-      chacha20_block (state, stream, src);
-      memcpy (dst, stream, bytes);
-      explicit_bzero (stream, sizeof stream);
-    }
-}
-
-/* Get the architecture optimized version.  */
-#include <chacha20_arch.h>
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
deleted file mode 100644
index 45ba54920d..0000000000
--- a/stdlib/tst-arc4random-chacha20.c
+++ /dev/null
@@ -1,167 +0,0 @@
-/* Basic tests for chacha20 cypher used in arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <arc4random.h>
-#include <support/check.h>
-#include <sys/cdefs.h>
-
-/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
-   actual does.  */
-#include <chacha20.c>
-
-static int
-do_test (void)
-{
-  const uint8_t key[CHACHA20_KEY_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t iv[CHACHA20_IV_SIZE] =
-    {
-      0x0, 0x0, 0x0, 0x0,
-      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
-    };
-  const uint8_t expected1[CHACHA20_BUFSIZE] =
-    {
-      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
-      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
-      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
-      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
-      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
-      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
-      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
-      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
-      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
-      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
-      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
-      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
-      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
-      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
-      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
-      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
-      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
-      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
-      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
-      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
-      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
-      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
-      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
-      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
-      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
-      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
-      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
-      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
-      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
-      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
-      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
-      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
-      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
-      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
-      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
-      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
-      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
-      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
-      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
-      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
-      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
-      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
-      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
-      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
-      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
-      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
-      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
-    };
-
-  const uint8_t expected2[CHACHA20_BUFSIZE] =
-    {
-      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
-      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
-      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
-      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
-      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
-      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
-      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
-      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
-      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
-      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
-      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
-      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
-      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
-      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
-      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
-      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
-      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
-      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
-      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
-      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
-      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
-      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
-      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
-      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
-      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
-      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
-      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
-      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
-      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
-      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
-      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
-      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
-      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
-      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
-      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
-      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
-      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
-      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
-      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
-      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
-      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
-      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
-      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
-      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
-      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
-      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
-      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
-    };
-
-  /* Check with the expected internal arc4random keystream buffer.  Some
-     architecture optimizations expects a buffer with a minimum size which
-     is a multiple of then ChaCha20 blocksize, so they might not be prepared
-     to handle smaller buffers.  */
-
-  uint8_t output[CHACHA20_BUFSIZE];
-
-  uint32_t state[CHACHA20_STATE_LEN];
-  chacha20_init (state, key, iv);
-
-  /* Check with the initial state.  */
-  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
-
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
-
-  /* And on the next round.  */
-  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
-  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
-
-  return 0;
-}
-
-#include <support/test-driver.c>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7dfd1b62dd..17fb1c5b72 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-aarch64
-endif
-
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
deleted file mode 100644
index cce5291c5c..0000000000
--- a/sysdeps/aarch64/chacha20-aarch64.S
+++ /dev/null
@@ -1,314 +0,0 @@
-/* Optimized AArch64 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-/* Only LE is supported.  */
-#ifdef __AARCH64EL__
-
-#define GET_DATA_POINTER(reg, name) \
-        adrp    reg, name ; \
-        add     reg, reg, :lo12:name
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; dsb sy; isb;
-
-.cpu generic+simd
-
-.text
-
-/* register macros */
-#define INPUT     x0
-#define DST       x1
-#define SRC       x2
-#define NBLKS     x3
-#define ROUND     x4
-#define INPUT_CTR x5
-#define INPUT_POS x6
-#define CTR       x7
-
-/* vector registers */
-#define X0 v16
-#define X4 v17
-#define X8 v18
-#define X12 v19
-
-#define X1 v20
-#define X5 v21
-
-#define X9 v22
-#define X13 v23
-#define X2 v24
-#define X6 v25
-
-#define X3 v26
-#define X7 v27
-#define X11 v28
-#define X15 v29
-
-#define X10 v30
-#define X14 v31
-
-#define VCTR    v0
-#define VTMP0   v1
-#define VTMP1   v2
-#define VTMP2   v3
-#define VTMP3   v4
-#define X12_TMP v5
-#define X13_TMP v6
-#define ROT8    v7
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _(...) __VA_ARGS__
-
-#define vpunpckldq(s1, s2, dst) \
-	zip1 dst.4s, s2.4s, s1.4s;
-
-#define vpunpckhdq(s1, s2, dst) \
-	zip2 dst.4s, s2.4s, s1.4s;
-
-#define vpunpcklqdq(s1, s2, dst) \
-	zip1 dst.2d, s2.2d, s1.2d;
-
-#define vpunpckhqdq(s1, s2, dst) \
-	zip2 dst.2d, s2.2d, s1.2d;
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	vpunpckhdq(x1, x0, t2); \
-	vpunpckldq(x1, x0, x0); \
-	\
-	vpunpckldq(x3, x2, t1); \
-	vpunpckhdq(x3, x2, x2); \
-	\
-	vpunpckhqdq(t1, x0, x1); \
-	vpunpcklqdq(t1, x0, x0); \
-	\
-	vpunpckhqdq(x2, t2, x3); \
-	vpunpcklqdq(x2, t2, x2);
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define XOR(d,s1,s2) \
-	eor d.16b, s2.16b, s1.16b;
-
-#define PLUS(ds,s) \
-	add ds.4s, ds.4s, s.4s;
-
-#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
-	shl dst1.4s, src1.4s, #(c);		\
-	shl dst2.4s, src2.4s, #(c);		\
-	shl dst3.4s, src3.4s, #(c);		\
-	shl dst4.4s, src4.4s, #(c);		\
-	sri dst1.4s, src1.4s, #(32 - (c));	\
-	sri dst2.4s, src2.4s, #(32 - (c));	\
-	sri dst3.4s, src3.4s, #(32 - (c));	\
-	sri dst4.4s, src4.4s, #(32 - (c));
-
-#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
-	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
-	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
-	tbl dst4.16b, {src4.16b}, ROT8.16b;
-
-#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
-	rev32 dst1.8h, src1.8h;			\
-	rev32 dst2.8h, src2.8h;			\
-	rev32 dst3.8h, src3.8h;			\
-	rev32 dst4.8h, src4.8h;
-
-#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(a1,b1); PLUS(a2,b2);						\
-	PLUS(a3,b3); PLUS(a4,b4);						\
-	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
-	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
-		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
-	PLUS(c1,d1); PLUS(c2,d2);						\
-	PLUS(c3,d3); PLUS(c4,d4);						\
-	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
-	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
-		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
-
-.align 4
-L(__chacha20_blocks4_data_inc_counter):
-	.long 0,1,2,3
-
-.align 4
-L(__chacha20_blocks4_data_rot8):
-	.byte 3,0,1,2
-	.byte 7,4,5,6
-	.byte 11,8,9,10
-	.byte 15,12,13,14
-
-.hidden __chacha20_neon_blocks4
-ENTRY (__chacha20_neon_blocks4)
-	/* input:
-	 *	x0: input
-	 *	x1: dst
-	 *	x2: src
-	 *	x3: nblks (multiple of 4)
-	 */
-
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
-	add INPUT_CTR, INPUT, #(12*4);
-	ld1 {ROT8.16b}, [CTR];
-	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
-	mov INPUT_POS, INPUT;
-	ld1 {VCTR.16b}, [CTR];
-
-L(loop4):
-	/* Construct counter vectors X12 and X13 */
-
-	ld1 {X15.16b}, [INPUT_CTR];
-	mov ROUND, #20;
-	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
-
-	dup X12.4s, X15.s[0];
-	dup X13.4s, X15.s[1];
-	ldr CTR, [INPUT_CTR];
-	add X12.4s, X12.4s, VCTR.4s;
-	dup X0.4s, VTMP1.s[0];
-	dup X1.4s, VTMP1.s[1];
-	dup X2.4s, VTMP1.s[2];
-	dup X3.4s, VTMP1.s[3];
-	dup X14.4s, X15.s[2];
-	cmhi VTMP0.4s, VCTR.4s, X12.4s;
-	dup X15.4s, X15.s[3];
-	add CTR, CTR, #4; /* Update counter */
-	dup X4.4s, VTMP2.s[0];
-	dup X5.4s, VTMP2.s[1];
-	dup X6.4s, VTMP2.s[2];
-	dup X7.4s, VTMP2.s[3];
-	sub X13.4s, X13.4s, VTMP0.4s;
-	dup X8.4s, VTMP3.s[0];
-	dup X9.4s, VTMP3.s[1];
-	dup X10.4s, VTMP3.s[2];
-	dup X11.4s, VTMP3.s[3];
-	mov X12_TMP.16b, X12.16b;
-	mov X13_TMP.16b, X13.16b;
-	str CTR, [INPUT_CTR];
-
-L(round2):
-	subs ROUND, ROUND, #2
-	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-		      X2, X6, X10, X14,   X3, X7, X11, X15,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
-		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
-		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
-	b.ne L(round2);
-
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
-
-	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
-	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
-	PLUS(X0, VTMP2);
-	PLUS(X1, VTMP3);
-	PLUS(X2, X12_TMP);
-	PLUS(X3, X13_TMP);
-
-	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
-	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
-	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
-	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
-	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
-	mov INPUT_POS, INPUT;
-	PLUS(X4, VTMP2);
-	PLUS(X5, VTMP3);
-	PLUS(X6, X12_TMP);
-	PLUS(X7, X13_TMP);
-
-	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
-	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
-	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
-	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
-	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
-	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
-	PLUS(X8, VTMP2);
-	PLUS(X9, VTMP3);
-	PLUS(X10, X12_TMP);
-	PLUS(X11, X13_TMP);
-	PLUS(X14, VTMP0);
-	PLUS(X15, VTMP1);
-
-	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
-	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
-
-	subs NBLKS, NBLKS, #4;
-
-	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
-	st1 {X1.16b,X5.16b}, [DST], #32;
-	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
-	st1 {X10.16b,X14.16b}, [DST], #32;
-	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
-
-	b.ne L(loop4);
-
-	ret_spec_stop
-END (__chacha20_neon_blocks4)
-
-#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
deleted file mode 100644
index 37dbb917f1..0000000000
--- a/sysdeps/aarch64/chacha20_arch.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <ldsodefs.h>
-#include <stdbool.h>
-
-unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
-#ifdef __AARCH64EL__
-  __chacha20_neon_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  chacha20_crypt_generic (state, dst, src, bytes);
-#endif
-}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
deleted file mode 100644
index 1b4559ccbc..0000000000
--- a/sysdeps/generic/chacha20_arch.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Chacha20 implementation, generic interface for encrypt.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
index acceb9b67f..b5a42c70d6 100644
--- a/sysdeps/generic/not-cancel.h
+++ b/sysdeps/generic/not-cancel.h
@@ -20,6 +20,7 @@
 # define NOT_CANCEL_H
 
 #include <fcntl.h>
+#include <poll.h>
 #include <unistd.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -50,5 +51,7 @@
   __fcntl64 (fd, cmd, __VA_ARGS__)
 #define __getrandom_nocancel(buf, size, flags) \
   __getrandom (buf, size, flags)
+#define __poll_infinity_nocancel(fds, nfds) \
+  __poll (fds, nfds, -1)
 
 #endif /* NOT_CANCEL_H  */
diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
index a91915831b..d76c715a96 100644
--- a/sysdeps/generic/tls-internal-struct.h
+++ b/sysdeps/generic/tls-internal-struct.h
@@ -23,7 +23,6 @@ struct tls_internal_t
 {
   char *strsignal_buf;
   char *strerror_l_buf;
-  struct arc4random_state_t *rand_state;
 };
 
 #endif
diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
index 8a0f37d509..b32b31b5a9 100644
--- a/sysdeps/generic/tls-internal.c
+++ b/sysdeps/generic/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
 {
   free (__tls_internal.strsignal_buf);
   free (__tls_internal.strerror_l_buf);
-
-  if (__tls_internal.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-	 cached it won't leak any data.  */
-      explicit_bzero (__tls_internal.rand_state,
-		      sizeof (*__tls_internal.rand_state));
-      free (__tls_internal.rand_state);
-    }
 }
diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
index 667068c8cf..e60b86fab1 100644
--- a/sysdeps/mach/hurd/_Fork.c
+++ b/sysdeps/mach/hurd/_Fork.c
@@ -662,8 +662,6 @@ retry:
       _hurd_malloc_fork_child ();
       call_function_static_weak (__malloc_fork_unlock_child);
 
-      call_function_static_weak (__arc4random_fork_subprocess);
-
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
index 9a3a7ed59a..ae58b734e3 100644
--- a/sysdeps/mach/hurd/not-cancel.h
+++ b/sysdeps/mach/hurd/not-cancel.h
@@ -21,6 +21,7 @@
 
 #include <fcntl.h>
 #include <unistd.h>
+#include <poll.h>
 #include <sys/wait.h>
 #include <time.h>
 #include <sys/uio.h>
@@ -77,6 +78,9 @@ __typeof (__fcntl) __fcntl_nocancel;
 #define __getrandom_nocancel(buf, size, flags) \
   __getrandom (buf, size, flags)
 
+#define __poll_infinity_nocancel(fds, nfds) \
+  __poll (fds, nfds, -1)
+
 #if IS_IN (libc)
 hidden_proto (__close_nocancel)
 hidden_proto (__close_nocancel_nostatus)
diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
index 7dc02569f6..dd568992e2 100644
--- a/sysdeps/nptl/_Fork.c
+++ b/sysdeps/nptl/_Fork.c
@@ -43,8 +43,6 @@ _Fork (void)
       self->robust_head.list = &self->robust_head;
       INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
 			     sizeof (struct robust_list_head));
-
-      call_function_static_weak (__arc4random_fork_subprocess);
     }
   return pid;
 }
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
deleted file mode 100644
index 8c75165f7f..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
+++ /dev/null
@@ -1,4 +0,0 @@
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
deleted file mode 100644
index cf9e735326..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
deleted file mode 100644
index 08494dc045..0000000000
--- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
-  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
-    __chacha20_power8_blocks4 (state, dst, src,
-			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index abb0aa3f11..71a59529f3 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,8 +1,3 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-endif
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
deleted file mode 100644
index 0bbdcb9363..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Optimized PowerPC implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
-   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <altivec.h>
-#include <endian.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <sys/cdefs.h>
-
-typedef vector unsigned char vector16x_u8;
-typedef vector unsigned int vector4x_u32;
-typedef vector unsigned long long vector2x_u64;
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-static const vector16x_u8 le_bswap_const =
-  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
-#endif
-
-static inline vector4x_u32
-vec_rol_elems (vector4x_u32 v, unsigned int idx)
-{
-#if __BYTE_ORDER != __BIG_ENDIAN
-  return vec_sld (v, v, (16 - (4 * idx)) & 15);
-#else
-  return vec_sld (v, v, (4 * idx) & 15);
-#endif
-}
-
-static inline vector4x_u32
-vec_load_le (unsigned long offset, const unsigned char *ptr)
-{
-  vector4x_u32 vec;
-  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
-				 le_bswap_const);
-#endif
-  return vec;
-}
-
-static inline void
-vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
-			       le_bswap_const);
-#endif
-  vec_vsx_st (vec, offset, (uint32_t *)ptr);
-}
-
-
-static inline vector4x_u32
-vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
-{
-#if __BYTE_ORDER == __BIG_ENDIAN
-  static const vector16x_u8 swap32 =
-    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
-  vector2x_u64 vec, add, sum;
-
-  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
-  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
-  sum = vec + add;
-  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
-#else
-  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
-#endif
-}
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE(v1,rolv)			\
-	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
-
-#define PLUS(ds,s) \
-	((ds) += (s))
-
-#define XOR(ds,s) \
-	((ds) ^= (s))
-
-#define ADD_U64(v,a) \
-	(v = vec_add_ctr_u64(v, a))
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0, x1, x2, x3) ({ \
-	vector4x_u32 t1 = vec_mergeh(x0, x2); \
-	vector4x_u32 t2 = vec_mergel(x0, x2); \
-	vector4x_u32 t3 = vec_mergeh(x1, x3); \
-	x3 = vec_mergel(x1, x3); \
-	x0 = vec_mergeh(t1, t3); \
-	x1 = vec_mergel(t1, t3); \
-	x2 = vec_mergeh(t2, x3); \
-	x3 = vec_mergel(t2, x3); \
-      })
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
-
-unsigned int attribute_hidden
-__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
-			   size_t nblks)
-{
-  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
-  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
-  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
-  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
-  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
-  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
-  vector4x_u32 state0, state1, state2, state3;
-  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
-  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
-  vector4x_u32 tmp;
-  int i;
-
-  /* Force preload of constants to vector registers.  */
-  __asm__ ("": "+v" (counters_0123) :: "memory");
-  __asm__ ("": "+v" (counter_4) :: "memory");
-  __asm__ ("": "+v" (rotate_16) :: "memory");
-  __asm__ ("": "+v" (rotate_12) :: "memory");
-  __asm__ ("": "+v" (rotate_8) :: "memory");
-  __asm__ ("": "+v" (rotate_7) :: "memory");
-
-  state0 = vec_vsx_ld (0 * 16, state);
-  state1 = vec_vsx_ld (1 * 16, state);
-  state2 = vec_vsx_ld (2 * 16, state);
-  state3 = vec_vsx_ld (3 * 16, state);
-
-  do
-    {
-      v0 = vec_splat (state0, 0);
-      v1 = vec_splat (state0, 1);
-      v2 = vec_splat (state0, 2);
-      v3 = vec_splat (state0, 3);
-      v4 = vec_splat (state1, 0);
-      v5 = vec_splat (state1, 1);
-      v6 = vec_splat (state1, 2);
-      v7 = vec_splat (state1, 3);
-      v8 = vec_splat (state2, 0);
-      v9 = vec_splat (state2, 1);
-      v10 = vec_splat (state2, 2);
-      v11 = vec_splat (state2, 3);
-      v12 = vec_splat (state3, 0);
-      v13 = vec_splat (state3, 1);
-      v14 = vec_splat (state3, 2);
-      v15 = vec_splat (state3, 3);
-
-      v12 += counters_0123;
-      v13 -= vec_cmplt (v12, counters_0123);
-
-      for (i = 20; i > 0; i -= 2)
-	{
-	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
-	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
-	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
-	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
-	}
-
-      v0 += vec_splat (state0, 0);
-      v1 += vec_splat (state0, 1);
-      v2 += vec_splat (state0, 2);
-      v3 += vec_splat (state0, 3);
-      v4 += vec_splat (state1, 0);
-      v5 += vec_splat (state1, 1);
-      v6 += vec_splat (state1, 2);
-      v7 += vec_splat (state1, 3);
-      v8 += vec_splat (state2, 0);
-      v9 += vec_splat (state2, 1);
-      v10 += vec_splat (state2, 2);
-      v11 += vec_splat (state2, 3);
-      tmp = vec_splat( state3, 0);
-      tmp += counters_0123;
-      v12 += tmp;
-      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
-      v14 += vec_splat (state3, 2);
-      v15 += vec_splat (state3, 3);
-      ADD_U64 (state3, counter_4);
-
-      transpose_4x4 (v0, v1, v2, v3);
-      transpose_4x4 (v4, v5, v6, v7);
-      transpose_4x4 (v8, v9, v10, v11);
-      transpose_4x4 (v12, v13, v14, v15);
-
-      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
-      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
-      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
-      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
-
-      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
-      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
-      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
-      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
-
-      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
-      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
-      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
-      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
-
-      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
-      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
-      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
-      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
-
-      src += 4*64;
-      dst += 4*64;
-
-      nblks -= 4;
-    }
-  while (nblks);
-
-  vec_vsx_st (state3, 3 * 16, state);
-
-  return 0;
-}
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
deleted file mode 100644
index ded06762b6..0000000000
--- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* PowerPC optimization for ChaCha20.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-
-unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
-					const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static void
-chacha20_crypt (uint32_t *state, uint8_t *dst,
-		const uint8_t *src, size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
-
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 96c110f490..66ed844e68 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
-
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-s390x \
-  # sysdep_routines
-endif
diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
deleted file mode 100644
index e38504d370..0000000000
--- a/sysdeps/s390/s390-64/chacha20-s390x.S
+++ /dev/null
@@ -1,573 +0,0 @@
-/* Optimized s390x implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
-
-   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
- */
-
-#include <sysdep.h>
-
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-
-/* CFA expressions are used for pointing CFA and registers to
- * SP relative offsets. */
-# define DW_REGNO_SP 15
-
-/* Fixed length encoding used for integers for now. */
-# define DW_SLEB128_7BIT(value) \
-        0x00|((value) & 0x7f)
-# define DW_SLEB128_28BIT(value) \
-        0x80|((value)&0x7f), \
-        0x80|(((value)>>7)&0x7f), \
-        0x80|(((value)>>14)&0x7f), \
-        0x00|(((value)>>21)&0x7f)
-
-# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
-        .cfi_escape \
-          0x0f, /* DW_CFA_def_cfa_expression */ \
-            DW_SLEB128_7BIT(11), /* length */ \
-          0x7f, /* DW_OP_breg15, rsp + constant */ \
-            DW_SLEB128_28BIT(rsp_offs), \
-          0x06, /* DW_OP_deref */ \
-          0x23, /* DW_OP_plus_constu */ \
-            DW_SLEB128_28BIT((cfa_depth)+160)
-
-.machine "z13+vx"
-.text
-
-.balign 16
-.Lconsts:
-.Lwordswap:
-	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
-.Lbswap128:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap32:
-	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
-.Lone:
-	.long 0, 0, 0, 1
-.Ladd_counter_0123:
-	.long 0, 1, 2, 3
-.Ladd_counter_4567:
-	.long 4, 5, 6, 7
-
-/* register macros */
-#define INPUT %r2
-#define DST   %r3
-#define SRC   %r4
-#define NBLKS %r0
-#define ROUND %r1
-
-/* stack structure */
-
-#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
-#define STACK_FRAME_F8_F15 (8 * 8)
-#define STACK_FRAME_Y0_Y15 (16 * 16)
-#define STACK_FRAME_CTR    (4 * 16)
-#define STACK_FRAME_PARAMS (6 * 8)
-
-#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
-		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
-		     STACK_FRAME_PARAMS)
-
-#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
-#define STACK_F9     (STACK_F8 + 8)
-#define STACK_F10    (STACK_F9 + 8)
-#define STACK_F11    (STACK_F10 + 8)
-#define STACK_F12    (STACK_F11 + 8)
-#define STACK_F13    (STACK_F12 + 8)
-#define STACK_F14    (STACK_F13 + 8)
-#define STACK_F15    (STACK_F14 + 8)
-#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
-#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
-#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
-#define STACK_DST    (STACK_INPUT + 8)
-#define STACK_SRC    (STACK_DST + 8)
-#define STACK_NBLKS  (STACK_SRC + 8)
-#define STACK_POCTX  (STACK_NBLKS + 8)
-#define STACK_POSRC  (STACK_POCTX + 8)
-
-#define STACK_G0_H3  STACK_Y0_Y15
-
-/* vector registers */
-#define A0 %v0
-#define A1 %v1
-#define A2 %v2
-#define A3 %v3
-
-#define B0 %v4
-#define B1 %v5
-#define B2 %v6
-#define B3 %v7
-
-#define C0 %v8
-#define C1 %v9
-#define C2 %v10
-#define C3 %v11
-
-#define D0 %v12
-#define D1 %v13
-#define D2 %v14
-#define D3 %v15
-
-#define E0 %v16
-#define E1 %v17
-#define E2 %v18
-#define E3 %v19
-
-#define F0 %v20
-#define F1 %v21
-#define F2 %v22
-#define F3 %v23
-
-#define G0 %v24
-#define G1 %v25
-#define G2 %v26
-#define G3 %v27
-
-#define H0 %v28
-#define H1 %v29
-#define H2 %v30
-#define H3 %v31
-
-#define IO0 E0
-#define IO1 E1
-#define IO2 E2
-#define IO3 E3
-#define IO4 F0
-#define IO5 F1
-#define IO6 F2
-#define IO7 F3
-
-#define S0 G0
-#define S1 G1
-#define S2 G2
-#define S3 G3
-
-#define TMP0 H0
-#define TMP1 H1
-#define TMP2 H2
-#define TMP3 H3
-
-#define X0 A0
-#define X1 A1
-#define X2 A2
-#define X3 A3
-#define X4 B0
-#define X5 B1
-#define X6 B2
-#define X7 B3
-#define X8 C0
-#define X9 C1
-#define X10 C2
-#define X11 C3
-#define X12 D0
-#define X13 D1
-#define X14 D2
-#define X15 D3
-
-#define Y0 E0
-#define Y1 E1
-#define Y2 E2
-#define Y3 E3
-#define Y4 F0
-#define Y5 F1
-#define Y6 F2
-#define Y7 F3
-#define Y8 G0
-#define Y9 G1
-#define Y10 G2
-#define Y11 G3
-#define Y12 H0
-#define Y13 H1
-#define Y14 H2
-#define Y15 H3
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-#define _ /*_*/
-
-#define START_STACK(last_r) \
-	lgr %r0, %r15; \
-	lghi %r1, ~15; \
-	stmg %r6, last_r, 6 * 8(%r15); \
-	aghi %r0, -STACK_MAX; \
-	ngr %r0, %r1; \
-	lgr %r1, %r15; \
-	cfi_def_cfa_register(1); \
-	lgr %r15, %r0; \
-	stg %r1, 0(%r15); \
-	cfi_cfa_on_stack(0, 0); \
-	std %f8, STACK_F8(%r15); \
-	std %f9, STACK_F9(%r15); \
-	std %f10, STACK_F10(%r15); \
-	std %f11, STACK_F11(%r15); \
-	std %f12, STACK_F12(%r15); \
-	std %f13, STACK_F13(%r15); \
-	std %f14, STACK_F14(%r15); \
-	std %f15, STACK_F15(%r15);
-
-#define END_STACK(last_r) \
-	lg %r1, 0(%r15); \
-	ld %f8, STACK_F8(%r15); \
-	ld %f9, STACK_F9(%r15); \
-	ld %f10, STACK_F10(%r15); \
-	ld %f11, STACK_F11(%r15); \
-	ld %f12, STACK_F12(%r15); \
-	ld %f13, STACK_F13(%r15); \
-	ld %f14, STACK_F14(%r15); \
-	ld %f15, STACK_F15(%r15); \
-	lmg %r6, last_r, 6 * 8(%r1); \
-	lgr %r15, %r1; \
-	cfi_def_cfa_register(DW_REGNO_SP);
-
-#define PLUS(dst,src) \
-	vaf dst, dst, src;
-
-#define XOR(dst,src) \
-	vx dst, dst, src;
-
-#define ROTATE(v1,c) \
-	verllf v1, v1, (c)(0);
-
-#define WORD_ROTATE(v1,s) \
-	vsldb v1, v1, v1, ((s) * 4);
-
-#define DST_8(OPER, I, J) \
-	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
-	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
-
-/**********************************************************************
-  round macros
- **********************************************************************/
-
-/**********************************************************************
-  8-way chacha20 ("vertical")
- **********************************************************************/
-
-#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      op1,op2,op3,op4,op5,op6,op7,op8,\
-			      op9,op10,op11,op12) \
-	op1;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op2;						\
-	    XOR(x3, x0);  XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0);  XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op3;						\
-		ROTATE(x3, 16); ROTATE(x7, 16);			\
-		ROTATE(x11, 16); ROTATE(x15, 16);		\
-		ROTATE(y3, 16); ROTATE(y7, 16);			\
-		ROTATE(y11, 16); ROTATE(y15, 16);		\
-	op4;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op5;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op6;						\
-		ROTATE(x1,12); ROTATE(x5,12);			\
-		ROTATE(x9,12); ROTATE(x13,12);			\
-		ROTATE(y1,12); ROTATE(y5,12);			\
-		ROTATE(y9,12); ROTATE(y13,12);			\
-	op7;							\
-	PLUS(x0, x1); PLUS(x4, x5);				\
-	PLUS(x8, x9); PLUS(x12, x13);				\
-	PLUS(y0, y1); PLUS(y4, y5);				\
-	PLUS(y8, y9); PLUS(y12, y13);				\
-	    op8;						\
-	    XOR(x3, x0); XOR(x7, x4);				\
-	    XOR(x11, x8); XOR(x15, x12);			\
-	    XOR(y3, y0); XOR(y7, y4);				\
-	    XOR(y11, y8); XOR(y15, y12);			\
-		op9;						\
-		ROTATE(x3,8); ROTATE(x7,8);			\
-		ROTATE(x11,8); ROTATE(x15,8);			\
-		ROTATE(y3,8); ROTATE(y7,8);			\
-		ROTATE(y11,8); ROTATE(y15,8);			\
-	op10;							\
-	PLUS(x2, x3); PLUS(x6, x7);				\
-	PLUS(x10, x11); PLUS(x14, x15);				\
-	PLUS(y2, y3); PLUS(y6, y7);				\
-	PLUS(y10, y11); PLUS(y14, y15);				\
-	    op11;						\
-	    XOR(x1, x2); XOR(x5, x6);				\
-	    XOR(x9, x10); XOR(x13, x14);			\
-	    XOR(y1, y2); XOR(y5, y6);				\
-	    XOR(y9, y10); XOR(y13, y14);			\
-		op12;						\
-		ROTATE(x1,7); ROTATE(x5,7);			\
-		ROTATE(x9,7); ROTATE(x13,7);			\
-		ROTATE(y1,7); ROTATE(y5,7);			\
-		ROTATE(y9,7); ROTATE(y13,7);
-
-#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
-			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
-	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
-			      x8,x9,x10,x11,x12,x13,x14,x15,\
-			      y0,y1,y2,y3,y4,y5,y6,y7,\
-			      y8,y9,y10,y11,y12,y13,y14,y15,\
-			      ,,,,,,,,,,,)
-
-#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
-	  vmrhf tmp0, v0, v1;					\
-	  vmrhf tmp1, v2, v3;					\
-	  vmrlf tmp2, v0, v1;					\
-	  vmrlf   v3, v2, v3;					\
-	  vmrhf tmpa, va, vb;					\
-	  vmrhf tmpb, vc, vd;					\
-	  vmrlf tmpc, va, vb;					\
-	  vmrlf   vd, vc, vd;					\
-	  vpdi v0, tmp0, tmp1, 0;				\
-	  vpdi v1, tmp0, tmp1, 5;				\
-	  vpdi v2, tmp2,   v3, 0;				\
-	  vpdi v3, tmp2,   v3, 5;				\
-	  vpdi va, tmpa, tmpb, 0;				\
-	  vpdi vb, tmpa, tmpb, 5;				\
-	  vpdi vc, tmpc,   vd, 0;				\
-	  vpdi vd, tmpc,   vd, 5;
-
-.balign 8
-.globl __chacha20_s390x_vx_blocks8
-ENTRY (__chacha20_s390x_vx_blocks8)
-	/* input:
-	 *	%r2: input
-	 *	%r3: dst
-	 *	%r4: src
-	 *	%r5: nblks (multiple of 8)
-	 */
-
-	START_STACK(%r8);
-	lgr NBLKS, %r5;
-
-	larl %r7, .Lconsts;
-
-	/* Load counter. */
-	lg %r8, (12 * 4)(INPUT);
-	rllg %r8, %r8, 32;
-
-.balign 4
-	/* Process eight chacha20 blocks per loop. */
-.Lloop8:
-	vlm Y0, Y3, 0(INPUT);
-
-	slgfi NBLKS, 8;
-	lghi ROUND, (20 / 2);
-
-	/* Construct counter vectors X12/X13 & Y12/Y13. */
-	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
-	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
-	vrepf Y12, Y3, 0;
-	vrepf Y13, Y3, 1;
-	vaccf X5, Y12, X4;
-	vaccf Y5, Y12, Y4;
-	vaf X12, Y12, X4;
-	vaf Y12, Y12, Y4;
-	vaf X13, Y13, X5;
-	vaf Y13, Y13, Y5;
-
-	vrepf X0, Y0, 0;
-	vrepf X1, Y0, 1;
-	vrepf X2, Y0, 2;
-	vrepf X3, Y0, 3;
-	vrepf X4, Y1, 0;
-	vrepf X5, Y1, 1;
-	vrepf X6, Y1, 2;
-	vrepf X7, Y1, 3;
-	vrepf X8, Y2, 0;
-	vrepf X9, Y2, 1;
-	vrepf X10, Y2, 2;
-	vrepf X11, Y2, 3;
-	vrepf X14, Y3, 2;
-	vrepf X15, Y3, 3;
-
-	/* Store counters for blocks 0-7. */
-	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
-	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
-
-	vlr Y0, X0;
-	vlr Y1, X1;
-	vlr Y2, X2;
-	vlr Y3, X3;
-	vlr Y4, X4;
-	vlr Y5, X5;
-	vlr Y6, X6;
-	vlr Y7, X7;
-	vlr Y8, X8;
-	vlr Y9, X9;
-	vlr Y10, X10;
-	vlr Y11, X11;
-	vlr Y14, X14;
-	vlr Y15, X15;
-
-	/* Update and store counter. */
-	agfi %r8, 8;
-	rllg %r5, %r8, 32;
-	stg %r5, (12 * 4)(INPUT);
-
-.balign 4
-.Lround2_8:
-	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
-			 X2, X6, X10, X14,   X3, X7, X11, X15,
-			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
-			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
-	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
-			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
-			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
-			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
-	brctg ROUND, .Lround2_8;
-
-	/* Store blocks 4-7. */
-	vstm Y0, Y15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 0-3. */
-	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
-
-	lghi ROUND, 1;
-	j .Lfirst_output_4blks_8;
-
-.balign 4
-.Lsecond_output_4blks_8:
-	/* Load blocks 4-7. */
-	vlm X0, X15, STACK_Y0_Y15(%r15);
-
-	/* Load counters for blocks 4-7. */
-	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
-
-	lghi ROUND, 0;
-
-.balign 4
-	/* Output four chacha20 blocks per loop. */
-.Lfirst_output_4blks_8:
-	vlm Y12, Y15, 0(INPUT);
-	PLUS(X12, Y0);
-	PLUS(X13, Y1);
-	vrepf Y0, Y12, 0;
-	vrepf Y1, Y12, 1;
-	vrepf Y2, Y12, 2;
-	vrepf Y3, Y12, 3;
-	vrepf Y4, Y13, 0;
-	vrepf Y5, Y13, 1;
-	vrepf Y6, Y13, 2;
-	vrepf Y7, Y13, 3;
-	vrepf Y8, Y14, 0;
-	vrepf Y9, Y14, 1;
-	vrepf Y10, Y14, 2;
-	vrepf Y11, Y14, 3;
-	vrepf Y14, Y15, 2;
-	vrepf Y15, Y15, 3;
-	PLUS(X0, Y0);
-	PLUS(X1, Y1);
-	PLUS(X2, Y2);
-	PLUS(X3, Y3);
-	PLUS(X4, Y4);
-	PLUS(X5, Y5);
-	PLUS(X6, Y6);
-	PLUS(X7, Y7);
-	PLUS(X8, Y8);
-	PLUS(X9, Y9);
-	PLUS(X10, Y10);
-	PLUS(X11, Y11);
-	PLUS(X14, Y14);
-	PLUS(X15, Y15);
-
-	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
-	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
-			Y9, Y10, Y11, Y12, Y13, Y14);
-
-	vlm Y0, Y14, 0(SRC);
-	vperm X0, X0, X0, Y15;
-	vperm X1, X1, X1, Y15;
-	vperm X2, X2, X2, Y15;
-	vperm X3, X3, X3, Y15;
-	vperm X4, X4, X4, Y15;
-	vperm X5, X5, X5, Y15;
-	vperm X6, X6, X6, Y15;
-	vperm X7, X7, X7, Y15;
-	vperm X8, X8, X8, Y15;
-	vperm X9, X9, X9, Y15;
-	vperm X10, X10, X10, Y15;
-	vperm X11, X11, X11, Y15;
-	vperm X12, X12, X12, Y15;
-	vperm X13, X13, X13, Y15;
-	vperm X14, X14, X14, Y15;
-	vperm X15, X15, X15, Y15;
-	vl Y15, (15 * 16)(SRC);
-
-	XOR(Y0, X0);
-	XOR(Y1, X4);
-	XOR(Y2, X8);
-	XOR(Y3, X12);
-	XOR(Y4, X1);
-	XOR(Y5, X5);
-	XOR(Y6, X9);
-	XOR(Y7, X13);
-	XOR(Y8, X2);
-	XOR(Y9, X6);
-	XOR(Y10, X10);
-	XOR(Y11, X14);
-	XOR(Y12, X3);
-	XOR(Y13, X7);
-	XOR(Y14, X11);
-	XOR(Y15, X15);
-	vstm Y0, Y15, 0(DST);
-
-	aghi SRC, 256;
-	aghi DST, 256;
-
-	clgije ROUND, 1, .Lsecond_output_4blks_8;
-
-	clgijhe NBLKS, 8, .Lloop8;
-
-
-	END_STACK(%r8);
-	xgr %r2, %r2;
-	br %r14;
-END (__chacha20_s390x_vx_blocks8)
-
-#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
deleted file mode 100644
index 0c6abf77e8..0000000000
--- a/sysdeps/s390/s390-64/chacha20_arch.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <stdbool.h>
-#include <ldsodefs.h>
-#include <sys/auxv.h>
-
-unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
-					  const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-#ifdef HAVE_S390_VX_ASM_SUPPORT
-  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
-    {
-      __chacha20_s390x_vx_blocks8 (state, dst, src,
-				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-      return;
-    }
-#endif
-  chacha20_crypt_generic (state, dst, src, bytes);
-}
diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
index 2c58d5ae2f..a263d294b1 100644
--- a/sysdeps/unix/sysv/linux/not-cancel.h
+++ b/sysdeps/unix/sysv/linux/not-cancel.h
@@ -23,6 +23,7 @@
 #include <sysdep.h>
 #include <errno.h>
 #include <unistd.h>
+#include <sys/poll.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -70,9 +71,14 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
 static inline int
 __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
 {
-  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
+  return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags);
 }
 
+static inline int
+__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
+{
+  return INLINE_SYSCALL_CALL (ppoll, fds, nfds, NULL, NULL, 0);
+}
 
 /* Uncancelable fcntl.  */
 __typeof (__fcntl) __fcntl64_nocancel;
diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
index 0326ebb767..c8a9ed2d40 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.c
+++ b/sysdeps/unix/sysv/linux/tls-internal.c
@@ -16,7 +16,6 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib/arc4random.h>
 #include <string.h>
 #include <tls-internal.h>
 
@@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
   struct pthread *self = THREAD_SELF;
   free (self->tls_state.strsignal_buf);
   free (self->tls_state.strerror_l_buf);
-
-  if (self->tls_state.rand_state != NULL)
-    {
-      /* Clear any lingering random state prior so if the thread stack is
-         cached it won't leak any data.  */
-      explicit_bzero (self->tls_state.rand_state,
-		      sizeof (*self->tls_state.rand_state));
-      free (self->tls_state.rand_state);
-    }
 }
diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
index ebc65d896a..2ebe977802 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.h
+++ b/sysdeps/unix/sysv/linux/tls-internal.h
@@ -28,7 +28,6 @@ __glibc_tls_internal (void)
   return &THREAD_SELF->tls_state;
 }
 
-/* Reset the arc4random TCB state on fork.  */
 extern void __glibc_tls_internal_free (void) attribute_hidden;
 
 #endif
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 1178475d75..c19bef2dec 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
-ifeq ($(subdir),stdlib)
-sysdep_routines += \
-  chacha20-amd64-sse2 \
-  chacha20-amd64-avx2 \
-  # sysdep_routines
-endif
-
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
deleted file mode 100644
index aefd1cdbd0..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-avx2.S
+++ /dev/null
@@ -1,328 +0,0 @@
-/* Optimized AVX2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (32)
-#define STACK_VEC_X13 (32 + STACK_VEC_X12)
-#define STACK_TMP     (32 + STACK_VEC_X13)
-#define STACK_TMP1    (32 + STACK_TMP)
-
-#define STACK_MAX     (32 + STACK_TMP1)
-
-/* vector registers */
-#define X0 %ymm0
-#define X1 %ymm1
-#define X2 %ymm2
-#define X3 %ymm3
-#define X4 %ymm4
-#define X5 %ymm5
-#define X6 %ymm6
-#define X7 %ymm7
-#define X8 %ymm8
-#define X9 %ymm9
-#define X10 %ymm10
-#define X11 %ymm11
-#define X12 %ymm12
-#define X13 %ymm13
-#define X14 %ymm14
-#define X15 %ymm15
-
-#define X0h %xmm0
-#define X1h %xmm1
-#define X2h %xmm2
-#define X3h %xmm3
-#define X4h %xmm4
-#define X5h %xmm5
-#define X6h %xmm6
-#define X7h %xmm7
-#define X8h %xmm8
-#define X9h %xmm9
-#define X10h %xmm10
-#define X11h %xmm11
-#define X12h %xmm12
-#define X13h %xmm13
-#define X14h %xmm14
-#define X15h %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
-	vpunpckhdq x1, x0, t2; \
-	vpunpckldq x1, x0, x0; \
-	\
-	vpunpckldq x3, x2, t1; \
-	vpunpckhdq x3, x2, x2; \
-	\
-	vpunpckhqdq t1, x0, x1; \
-	vpunpcklqdq t1, x0, x0; \
-	\
-	vpunpckhqdq x2, t2, x3; \
-	vpunpcklqdq x2, t2, x2;
-
-/* 2x2 128-bit matrix transpose */
-#define transpose_16byte_2x2(x0,x1,t1) \
-	vmovdqa    x0, t1; \
-	vperm2i128 $0x20, x1, x0, x0; \
-	vperm2i128 $0x31, x1, t1, x1;
-
-/**********************************************************************
-  8-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp)	\
-	vpsrld $(32 - (c)), v1, tmp;	\
-	vpslld $(c), v1, v1;		\
-	vpaddb tmp, v1, v1;		\
-	vpsrld $(32 - (c)), v2, tmp;	\
-	vpslld $(c), v2, v2;		\
-	vpaddb tmp, v2, v2;
-
-#define ROTATE_SHUF_2(v1,v2,shuf)	\
-	vpshufb shuf, v1, v1;		\
-	vpshufb shuf, v2, v2;
-
-#define XOR(ds,s) \
-	vpxor s, ds, ds;
-
-#define PLUS(ds,s) \
-	vpaddd s, ds, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
-		      interleave_op1,interleave_op2,\
-		      interleave_op3,interleave_op4)		\
-	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
-		interleave_op1;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op2;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1);				\
-	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
-		interleave_op3;					\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE_SHUF_2(d1, d2, tmp1);			\
-		interleave_op4;					\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1);
-
-	.section .text.avx2, "ax", @progbits
-	.align 32
-chacha20_data:
-L(shuf_rol16):
-	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
-L(shuf_rol8):
-	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
-L(inc_counter):
-	.byte 0,1,2,3,4,5,6,7
-L(unsigned_cmp):
-	.long 0x80000000
-
-	.hidden __chacha20_avx2_blocks8
-ENTRY (__chacha20_avx2_blocks8)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 8)
-	 */
-	vzeroupper;
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~31, %rsp;
-
-L(loop8):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	vpmovzxbd L(inc_counter) rRIP, X0;
-	vpbroadcastd L(unsigned_cmp) rRIP, X2;
-	vpbroadcastd (12 * 4)(INPUT), X12;
-	vpbroadcastd (13 * 4)(INPUT), X13;
-	vpaddd X0, X12, X12;
-	vpxor X2, X0, X0;
-	vpxor X2, X12, X1;
-	vpcmpgtd X1, X0, X0;
-	vpsubd X0, X13, X13;
-	vmovdqa X12, (STACK_VEC_X12)(%rsp);
-	vmovdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	vpbroadcastd (0 * 4)(INPUT), X0;
-	vpbroadcastd (1 * 4)(INPUT), X1;
-	vpbroadcastd (2 * 4)(INPUT), X2;
-	vpbroadcastd (3 * 4)(INPUT), X3;
-	vpbroadcastd (4 * 4)(INPUT), X4;
-	vpbroadcastd (5 * 4)(INPUT), X5;
-	vpbroadcastd (6 * 4)(INPUT), X6;
-	vpbroadcastd (7 * 4)(INPUT), X7;
-	vpbroadcastd (8 * 4)(INPUT), X8;
-	vpbroadcastd (9 * 4)(INPUT), X9;
-	vpbroadcastd (10 * 4)(INPUT), X10;
-	vpbroadcastd (11 * 4)(INPUT), X11;
-	vpbroadcastd (14 * 4)(INPUT), X14;
-	vpbroadcastd (15 * 4)(INPUT), X15;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-
-L(round2):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqa X8, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
-	vmovdqa (STACK_TMP)(%rsp), X8;
-	vmovdqa X15, (STACK_TMP)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
-	sub $2, ROUND;
-	jnz L(round2);
-
-	vmovdqa X8, (STACK_TMP1)(%rsp);
-
-	/* tmp := X15 */
-	vpbroadcastd (0 * 4)(INPUT), X15;
-	PLUS(X0, X15);
-	vpbroadcastd (1 * 4)(INPUT), X15;
-	PLUS(X1, X15);
-	vpbroadcastd (2 * 4)(INPUT), X15;
-	PLUS(X2, X15);
-	vpbroadcastd (3 * 4)(INPUT), X15;
-	PLUS(X3, X15);
-	vpbroadcastd (4 * 4)(INPUT), X15;
-	PLUS(X4, X15);
-	vpbroadcastd (5 * 4)(INPUT), X15;
-	PLUS(X5, X15);
-	vpbroadcastd (6 * 4)(INPUT), X15;
-	PLUS(X6, X15);
-	vpbroadcastd (7 * 4)(INPUT), X15;
-	PLUS(X7, X15);
-	transpose_4x4(X0, X1, X2, X3, X8, X15);
-	transpose_4x4(X4, X5, X6, X7, X8, X15);
-	vmovdqa (STACK_TMP1)(%rsp), X8;
-	transpose_16byte_2x2(X0, X4, X15);
-	transpose_16byte_2x2(X1, X5, X15);
-	transpose_16byte_2x2(X2, X6, X15);
-	transpose_16byte_2x2(X3, X7, X15);
-	vmovdqa (STACK_TMP)(%rsp), X15;
-	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
-	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
-	vpbroadcastd (8 * 4)(INPUT), X0;
-	PLUS(X8, X0);
-	vpbroadcastd (9 * 4)(INPUT), X0;
-	PLUS(X9, X0);
-	vpbroadcastd (10 * 4)(INPUT), X0;
-	PLUS(X10, X0);
-	vpbroadcastd (11 * 4)(INPUT), X0;
-	PLUS(X11, X0);
-	vmovdqa (STACK_VEC_X12)(%rsp), X0;
-	PLUS(X12, X0);
-	vmovdqa (STACK_VEC_X13)(%rsp), X0;
-	PLUS(X13, X0);
-	vpbroadcastd (14 * 4)(INPUT), X0;
-	PLUS(X14, X0);
-	vpbroadcastd (15 * 4)(INPUT), X0;
-	PLUS(X15, X0);
-	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
-	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
-
-	/* Update counter */
-	addq $8, (12 * 4)(INPUT);
-
-	transpose_4x4(X8, X9, X10, X11, X0, X1);
-	transpose_4x4(X12, X13, X14, X15, X0, X1);
-	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
-	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
-	transpose_16byte_2x2(X8, X12, X0);
-	transpose_16byte_2x2(X9, X13, X0);
-	transpose_16byte_2x2(X10, X14, X0);
-	transpose_16byte_2x2(X11, X15, X0);
-	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
-	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
-	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
-	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
-	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
-	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
-	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
-	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
-	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
-	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
-
-	sub $8, NBLKS;
-	lea (8 * 64)(DST), DST;
-	lea (8 * 64)(SRC), SRC;
-	jnz L(loop8);
-
-	vzeroupper;
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret;
-	int3;
-END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
deleted file mode 100644
index 351a1109c6..0000000000
--- a/sysdeps/x86_64/chacha20-amd64-sse2.S
+++ /dev/null
@@ -1,311 +0,0 @@
-/* Optimized SSE2 implementation of ChaCha20 cipher.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
-
-   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
-
-   This file is part of Libgcrypt.
-
-   Libgcrypt is free software; you can redistribute it and/or modify
-   it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of
-   the License, or (at your option) any later version.
-
-   Libgcrypt is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with this program; if not, see <https://www.gnu.org/licenses/>.
-*/
-
-/* Based on D. J. Bernstein reference implementation at
-   http://cr.yp.to/chacha.html:
-
-   chacha-regs.c version 20080118
-   D. J. Bernstein
-   Public domain.  */
-
-#include <sysdep.h>
-#include <isa-level.h>
-
-#if MINIMUM_X86_ISA_LEVEL <= 2
-
-#ifdef PIC
-#  define rRIP (%rip)
-#else
-#  define rRIP
-#endif
-
-/* 'ret' instruction replacement for straight-line speculation mitigation */
-#define ret_spec_stop \
-        ret; int3;
-
-/* register macros */
-#define INPUT %rdi
-#define DST   %rsi
-#define SRC   %rdx
-#define NBLKS %rcx
-#define ROUND %eax
-
-/* stack structure */
-#define STACK_VEC_X12 (16)
-#define STACK_VEC_X13 (16 + STACK_VEC_X12)
-#define STACK_TMP     (16 + STACK_VEC_X13)
-#define STACK_TMP1    (16 + STACK_TMP)
-#define STACK_TMP2    (16 + STACK_TMP1)
-
-#define STACK_MAX     (16 + STACK_TMP2)
-
-/* vector registers */
-#define X0 %xmm0
-#define X1 %xmm1
-#define X2 %xmm2
-#define X3 %xmm3
-#define X4 %xmm4
-#define X5 %xmm5
-#define X6 %xmm6
-#define X7 %xmm7
-#define X8 %xmm8
-#define X9 %xmm9
-#define X10 %xmm10
-#define X11 %xmm11
-#define X12 %xmm12
-#define X13 %xmm13
-#define X14 %xmm14
-#define X15 %xmm15
-
-/**********************************************************************
-  helper macros
- **********************************************************************/
-
-/* 4x4 32-bit integer matrix transpose */
-#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
-	movdqa    x0, t2; \
-	punpckhdq x1, t2; \
-	punpckldq x1, x0; \
-	\
-	movdqa    x2, t1; \
-	punpckldq x3, t1; \
-	punpckhdq x3, x2; \
-	\
-	movdqa     x0, x1; \
-	punpckhqdq t1, x1; \
-	punpcklqdq t1, x0; \
-	\
-	movdqa     t2, x3; \
-	punpckhqdq x2, x3; \
-	punpcklqdq x2, t2; \
-	movdqa     t2, x2;
-
-/* fill xmm register with 32-bit value from memory */
-#define PBROADCASTD(mem32, xreg) \
-	movd mem32, xreg; \
-	pshufd $0, xreg, xreg;
-
-/**********************************************************************
-  4-way chacha20
- **********************************************************************/
-
-#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
-	movdqa v1, tmp1; 		\
-	movdqa v2, tmp2; 		\
-	psrld $(32 - (c)), v1;		\
-	pslld $(c), tmp1;		\
-	paddb tmp1, v1;			\
-	psrld $(32 - (c)), v2;		\
-	pslld $(c), tmp2;		\
-	paddb tmp2, v2;
-
-#define XOR(ds,s) \
-	pxor s, ds;
-
-#define PLUS(ds,s) \
-	paddd s, ds;
-
-#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
-	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
-	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
-	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
-	    ROTATE2(b1, b2,  7, tmp1, tmp2);
-
-	.section .text.sse2,"ax",@progbits
-
-chacha20_data:
-	.align 16
-L(counter1):
-	.long 1,0,0,0
-L(inc_counter):
-	.long 0,1,2,3
-L(unsigned_cmp):
-	.long 0x80000000,0x80000000,0x80000000,0x80000000
-
-	.hidden __chacha20_sse2_blocks4
-ENTRY (__chacha20_sse2_blocks4)
-	/* input:
-	 *	%rdi: input
-	 *	%rsi: dst
-	 *	%rdx: src
-	 *	%rcx: nblks (multiple of 4)
-	 */
-
-	pushq %rbp;
-	cfi_adjust_cfa_offset(8);
-	cfi_rel_offset(rbp, 0)
-	movq %rsp, %rbp;
-	cfi_def_cfa_register(%rbp);
-
-	subq $STACK_MAX, %rsp;
-	andq $~15, %rsp;
-
-L(loop4):
-	mov $20, ROUND;
-
-	/* Construct counter vectors X12 and X13 */
-	movdqa L(inc_counter) rRIP, X0;
-	movdqa L(unsigned_cmp) rRIP, X2;
-	PBROADCASTD((12 * 4)(INPUT), X12);
-	PBROADCASTD((13 * 4)(INPUT), X13);
-	paddd X0, X12;
-	movdqa X12, X1;
-	pxor X2, X0;
-	pxor X2, X1;
-	pcmpgtd X1, X0;
-	psubd X0, X13;
-	movdqa X12, (STACK_VEC_X12)(%rsp);
-	movdqa X13, (STACK_VEC_X13)(%rsp);
-
-	/* Load vectors */
-	PBROADCASTD((0 * 4)(INPUT), X0);
-	PBROADCASTD((1 * 4)(INPUT), X1);
-	PBROADCASTD((2 * 4)(INPUT), X2);
-	PBROADCASTD((3 * 4)(INPUT), X3);
-	PBROADCASTD((4 * 4)(INPUT), X4);
-	PBROADCASTD((5 * 4)(INPUT), X5);
-	PBROADCASTD((6 * 4)(INPUT), X6);
-	PBROADCASTD((7 * 4)(INPUT), X7);
-	PBROADCASTD((8 * 4)(INPUT), X8);
-	PBROADCASTD((9 * 4)(INPUT), X9);
-	PBROADCASTD((10 * 4)(INPUT), X10);
-	PBROADCASTD((11 * 4)(INPUT), X11);
-	PBROADCASTD((14 * 4)(INPUT), X14);
-	PBROADCASTD((15 * 4)(INPUT), X15);
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-
-L(round2_4):
-	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
-	movdqa (STACK_TMP)(%rsp), X11;
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X8, (STACK_TMP)(%rsp);
-	movdqa X9, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
-	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
-	movdqa (STACK_TMP)(%rsp), X8;
-	movdqa (STACK_TMP1)(%rsp), X9;
-	movdqa X11, (STACK_TMP)(%rsp);
-	movdqa X15, (STACK_TMP1)(%rsp);
-	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
-	sub $2, ROUND;
-	jnz L(round2_4);
-
-	/* tmp := X15 */
-	movdqa (STACK_TMP)(%rsp), X11;
-	PBROADCASTD((0 * 4)(INPUT), X15);
-	PLUS(X0, X15);
-	PBROADCASTD((1 * 4)(INPUT), X15);
-	PLUS(X1, X15);
-	PBROADCASTD((2 * 4)(INPUT), X15);
-	PLUS(X2, X15);
-	PBROADCASTD((3 * 4)(INPUT), X15);
-	PLUS(X3, X15);
-	PBROADCASTD((4 * 4)(INPUT), X15);
-	PLUS(X4, X15);
-	PBROADCASTD((5 * 4)(INPUT), X15);
-	PLUS(X5, X15);
-	PBROADCASTD((6 * 4)(INPUT), X15);
-	PLUS(X6, X15);
-	PBROADCASTD((7 * 4)(INPUT), X15);
-	PLUS(X7, X15);
-	PBROADCASTD((8 * 4)(INPUT), X15);
-	PLUS(X8, X15);
-	PBROADCASTD((9 * 4)(INPUT), X15);
-	PLUS(X9, X15);
-	PBROADCASTD((10 * 4)(INPUT), X15);
-	PLUS(X10, X15);
-	PBROADCASTD((11 * 4)(INPUT), X15);
-	PLUS(X11, X15);
-	movdqa (STACK_VEC_X12)(%rsp), X15;
-	PLUS(X12, X15);
-	movdqa (STACK_VEC_X13)(%rsp), X15;
-	PLUS(X13, X15);
-	movdqa X13, (STACK_TMP)(%rsp);
-	PBROADCASTD((14 * 4)(INPUT), X15);
-	PLUS(X14, X15);
-	movdqa (STACK_TMP1)(%rsp), X15;
-	movdqa X14, (STACK_TMP1)(%rsp);
-	PBROADCASTD((15 * 4)(INPUT), X13);
-	PLUS(X15, X13);
-	movdqa X15, (STACK_TMP2)(%rsp);
-
-	/* Update counter */
-	addq $4, (12 * 4)(INPUT);
-
-	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
-	movdqu X0, (64 * 0 + 16 * 0)(DST)
-	movdqu X1, (64 * 1 + 16 * 0)(DST)
-	movdqu X2, (64 * 2 + 16 * 0)(DST)
-	movdqu X3, (64 * 3 + 16 * 0)(DST)
-	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
-	movdqa (STACK_TMP)(%rsp), X13;
-	movdqa (STACK_TMP1)(%rsp), X14;
-	movdqa (STACK_TMP2)(%rsp), X15;
-	movdqu X4, (64 * 0 + 16 * 1)(DST)
-	movdqu X5, (64 * 1 + 16 * 1)(DST)
-	movdqu X6, (64 * 2 + 16 * 1)(DST)
-	movdqu X7, (64 * 3 + 16 * 1)(DST)
-	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
-	movdqu X8,  (64 * 0 + 16 * 2)(DST)
-	movdqu X9,  (64 * 1 + 16 * 2)(DST)
-	movdqu X10, (64 * 2 + 16 * 2)(DST)
-	movdqu X11, (64 * 3 + 16 * 2)(DST)
-	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
-	movdqu X12, (64 * 0 + 16 * 3)(DST)
-	movdqu X13, (64 * 1 + 16 * 3)(DST)
-	movdqu X14, (64 * 2 + 16 * 3)(DST)
-	movdqu X15, (64 * 3 + 16 * 3)(DST)
-
-	sub $4, NBLKS;
-	lea (4 * 64)(DST), DST;
-	lea (4 * 64)(SRC), SRC;
-	jnz L(loop4);
-
-	/* eax zeroed by round loop. */
-	leave;
-	cfi_adjust_cfa_offset(-8)
-	cfi_def_cfa_register(%rsp);
-	ret_spec_stop;
-END (__chacha20_sse2_blocks4)
-
-#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
deleted file mode 100644
index 6f3784e392..0000000000
--- a/sysdeps/x86_64/chacha20_arch.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* Chacha20 implementation, used on arc4random.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <isa-level.h>
-#include <ldsodefs.h>
-#include <cpu-features.h>
-#include <sys/param.h>
-
-unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
-				      const uint8_t *src, size_t nblks)
-     attribute_hidden;
-
-static inline void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
-{
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
-		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
-
-#if MINIMUM_X86_ISA_LEVEL > 2
-  __chacha20_avx2_blocks8 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
-  const struct cpu_features* cpu_features = __get_cpu_features ();
-
-  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
-  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
-      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
-    __chacha20_avx2_blocks8 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-  else
-    __chacha20_sse2_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#endif
-}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6] arc4random: simplify design for better safety
  2022-07-26 19:58         ` [PATCH v6] " Jason A. Donenfeld
@ 2022-07-26 20:17           ` Adhemerval Zanella Netto
  2022-07-26 20:56             ` Adhemerval Zanella Netto
  2022-07-28 10:29           ` Szabolcs Nagy
  1 sibling, 1 reply; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 20:17 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha
  Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert,
	Mark Harris, Eric Biggers, linux-crypto



On 26/07/22 16:58, Jason A. Donenfeld wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
> 
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
> 
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
> 
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
> 
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, and it may not port back to all
> ancient kernels, though it does to all kernels supported by glibc
> (≥3.2), so generally it's the best approximation we can do.
> 
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.

LGTM, I agree this is safe solution for 2.36, we can optimize it later
if is were the case.

I will run some tests and push it upstream.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Mark Harris <mark.hsj@gmail.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  LICENSES                                      |  23 -
>  NEWS                                          |   4 +-
>  include/stdlib.h                              |   3 -
>  manual/math.texi                              |  13 +-
>  stdlib/Makefile                               |   2 -
>  stdlib/arc4random.c                           | 196 ++----
>  stdlib/arc4random.h                           |  48 --
>  stdlib/chacha20.c                             | 191 ------
>  stdlib/tst-arc4random-chacha20.c              | 167 -----
>  sysdeps/aarch64/Makefile                      |   4 -
>  sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
>  sysdeps/aarch64/chacha20_arch.h               |  40 --
>  sysdeps/generic/chacha20_arch.h               |  24 -
>  sysdeps/generic/not-cancel.h                  |   3 +
>  sysdeps/generic/tls-internal-struct.h         |   1 -
>  sysdeps/generic/tls-internal.c                |  10 -
>  sysdeps/mach/hurd/_Fork.c                     |   2 -
>  sysdeps/mach/hurd/not-cancel.h                |   4 +
>  sysdeps/nptl/_Fork.c                          |   2 -
>  .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
>  .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
>  .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
>  sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
>  .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
>  .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
>  sysdeps/s390/s390-64/Makefile                 |   6 -
>  sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
>  sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
>  sysdeps/unix/sysv/linux/not-cancel.h          |   8 +-
>  sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
>  sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
>  sysdeps/x86_64/Makefile                       |   7 -
>  sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
>  sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
>  sysdeps/x86_64/chacha20_arch.h                |  55 --
>  35 files changed, 64 insertions(+), 2676 deletions(-)
>  delete mode 100644 stdlib/arc4random.h
>  delete mode 100644 stdlib/chacha20.c
>  delete mode 100644 stdlib/tst-arc4random-chacha20.c
>  delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
>  delete mode 100644 sysdeps/aarch64/chacha20_arch.h
>  delete mode 100644 sysdeps/generic/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
>  delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
>  delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
>  delete mode 100644 sysdeps/x86_64/chacha20_arch.h
> 
> diff --git a/LICENSES b/LICENSES
> index cd04fb6e84..530893b1dc 100644
> --- a/LICENSES
> +++ b/LICENSES
> @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
>   You should have received a copy of the GNU Lesser General Public
>   License along with this library; if not, see
>   <https://www.gnu.org/licenses/>.  */
> -\f
> -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S,
> -sysdeps/x86_64/chacha20-amd64-avx2.S, and
> -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and
> -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt,
> -with the following notices:
> -
> -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -This file is part of Libgcrypt.
> -
> -Libgcrypt is free software; you can redistribute it and/or modify
> -it under the terms of the GNU Lesser General Public License as
> -published by the Free Software Foundation; either version 2.1 of
> -the License, or (at your option) any later version.
> -
> -Libgcrypt is distributed in the hope that it will be useful,
> -but WITHOUT ANY WARRANTY; without even the implied warranty of
> -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -GNU Lesser General Public License for more details.
> -
> -You should have received a copy of the GNU Lesser General Public
> -License along with this program; if not, see <https://www.gnu.org/licenses/>.

Ok.

> diff --git a/NEWS b/NEWS
> index 8420a65cd0..fe531bfe1e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -61,8 +61,8 @@ Major new features:
>    is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
>  
>  * The functions arc4random, arc4random_buf, and arc4random_uniform have been
> -  added.  The functions use a pseudo-random number generator along with
> -  entropy from the kernel.
> +  added.  The functions wrap getrandom and/or /dev/urandom to return high-
> +  quality randomness from the kernel.
>  
>  Deprecated and removed features, and other changes affecting compatibility:
>  
> diff --git a/include/stdlib.h b/include/stdlib.h
> index cae7f7cdf8..db51f4a4f6 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform;
>  libc_hidden_proto (__arc4random_uniform);
>  extern void __arc4random_buf_internal (void *buffer, size_t len)
>       attribute_hidden;
> -/* Called from the fork function to reinitialize the internal cipher state
> -   in child process.  */
> -extern void __arc4random_fork_subprocess (void) attribute_hidden;
>  
>  extern double __strtod_internal (const char *__restrict __nptr,
>  				 char **__restrict __endptr, int __group)

Ok.

> diff --git a/manual/math.texi b/manual/math.texi
> index 141695cc30..6d69bbff66 100644
> --- a/manual/math.texi
> +++ b/manual/math.texi
> @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU
>  extension, based on OpenBSD interfaces.
>  
>  @Theglibc{} uses kernel entropy obtained either through @code{getrandom}
> -or by reading @file{/dev/urandom} to seed and periodically re-seed the
> -internal state.  A per-thread data pool is used, which allows fast output
> -generation.
> +or by reading @file{/dev/urandom} to seed.
>  
> -Although these functions provide higher random quality than ISO, BSD, and
> -SVID functions, these still use a Pseudo-Random generator and should not
> -be used in cryptographic contexts.
> -
> -The internal state is cleared and reseeded with kernel entropy on @code{fork}
> -and @code{_Fork}.  It is not cleared on either a direct @code{clone} syscall
> -or when using @theglibc{} @code{syscall} function.
> +These functions provide higher random quality than ISO, BSD, and SVID
> +functions, and may be used in cryptographic contexts.
>  
>  The prototypes for these functions are in @file{stdlib.h}.
>  @pindex stdlib.h
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index a900962685..f7b25c1981 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -246,7 +246,6 @@ tests := \
>    # tests
>  
>  tests-internal := \
> -  tst-arc4random-chacha20 \
>    tst-strtod1i \
>    tst-strtod3 \
>    tst-strtod4 \
> @@ -256,7 +255,6 @@ tests-internal := \
>    # tests-internal
>  
>  tests-static := \
> -  tst-arc4random-chacha20 \
>    tst-secure-getenv \
>    # tests-static
>  

Ok.

> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 65547e79aa..0cb9991328 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -1,4 +1,4 @@
> -/* Pseudo Random Number Generator based on ChaCha20.
> +/* Pseudo Random Number Generator
>     Copyright (C) 2022 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <arc4random.h>
>  #include <errno.h>
>  #include <not-cancel.h>
>  #include <stdio.h>
> @@ -24,53 +23,6 @@
>  #include <sys/mman.h>
>  #include <sys/param.h>
>  #include <sys/random.h>
> -#include <tls-internal.h>
> -
> -/* arc4random keeps two counters: 'have' is the current valid bytes not yet
> -   consumed in 'buf' while 'count' is the maximum number of bytes until a
> -   reseed.
> -
> -   Both the initial seed and reseed try to obtain entropy from the kernel
> -   and abort the process if none could be obtained.
> -
> -   The state 'buf' improves the usage of the cipher calls, allowing to call
> -   optimized implementations (if the architecture provides it) and minimize
> -   function call overhead.  */
> -
> -#include <chacha20.c>
> -
> -/* Called from the fork function to reset the state.  */
> -void
> -__arc4random_fork_subprocess (void)
> -{
> -  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> -  if (state != NULL)
> -    {
> -      explicit_bzero (state, sizeof (*state));
> -      /* Force key init.  */
> -      state->count = -1;
> -    }
> -}
> -
> -/* Return the current thread random state or try to create one if there is
> -   none available.  In the case malloc can not allocate a state, arc4random
> -   will try to get entropy with arc4random_getentropy.  */
> -static struct arc4random_state_t *
> -arc4random_get_state (void)
> -{
> -  struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state;
> -  if (state == NULL)
> -    {
> -      state = malloc (sizeof (struct arc4random_state_t));
> -      if (state != NULL)
> -	{
> -	  /* Force key initialization on first call.  */
> -	  state->count = -1;
> -	  __glibc_tls_internal ()->rand_state = state;
> -	}
> -    }
> -  return state;
> -}
>  
>  static void
>  arc4random_getrandom_failure (void)
> @@ -78,106 +30,63 @@ arc4random_getrandom_failure (void)
>    __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
>  }
>  
> -static void
> -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
> +void
> +__arc4random_buf (void *p, size_t n)
>  {
> -  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> -
> -  /* Mix optional user provided data.  */
> -  if (rnd != NULL)
> -    {
> -      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -      for (size_t i = 0; i < m; i++)
> -	state->buf[i] ^= rnd[i];
> -    }
> -
> -  /* Immediately reinit for backtracking resistance.  */
> -  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> -  explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> -}
> +  static int seen_initialized;
> +  size_t l;
> +  int fd;
>  
> -static void
> -arc4random_getentropy (void *rnd, size_t len)
> -{
> -  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +  if (n == 0)
>      return;
>  
> -  int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom",
> -						  O_RDONLY | O_CLOEXEC));
> -  if (fd != -1)
> +  for (;;)
>      {
> -      uint8_t *p = rnd;
> -      uint8_t *end = p + len;
> -      do
> +      l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0));
> +      if (l > 0)
>  	{
> -	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> -	  if (ret <= 0)
> -	    arc4random_getrandom_failure ();
> -	  p += ret;
> +	  if ((size_t) l == n)
> +	    return; /* Done reading, success.  */
> +	  p = (uint8_t *) p + l;
> +	  n -= l;
> +	  continue; /* Interrupted by a signal; keep going.  */
>  	}
> -      while (p < end);
> -
> -      if (__close_nocancel (fd) == 0)
> -	return;
> +      else if (l < 0 && errno == ENOSYS)
> +	break; /* No syscall, so fallback to /dev/urandom.  */
> +      arc4random_getrandom_failure ();
>      }
> -  arc4random_getrandom_failure ();
> -}
>  
> -/* Check if the thread context STATE should be reseed with kernel entropy
> -   depending of requested LEN bytes.  If there is less than requested,
> -   the state is either initialized or reseeded, otherwise the internal
> -   counter subtract the requested length.  */
> -static void
> -arc4random_check_stir (struct arc4random_state_t *state, size_t len)
> -{
> -  if (state->count <= len || state->count == -1)
> +  if (!atomic_load_relaxed (&seen_initialized))
>      {
> -      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> -      arc4random_getentropy (rnd, sizeof rnd);
> -
> -      if (state->count == -1)
> -	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> -      else
> -	arc4random_rekey (state, rnd, sizeof rnd);
> -
> -      explicit_bzero (rnd, sizeof rnd);
> -
> -      /* Invalidate the buf.  */
> -      state->have = 0;
> -      memset (state->buf, 0, sizeof state->buf);
> -      state->count = CHACHA20_RESEED_SIZE;
> +      /* Poll /dev/random as an approximation of RNG initialization.  */
> +      struct pollfd pfd = { .events = POLLIN };
> +      pfd.fd = TEMP_FAILURE_RETRY (
> +	  __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +      if (pfd.fd < 0)
> +	arc4random_getrandom_failure ();
> +      if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0)
> +	arc4random_getrandom_failure ();
> +      if (__close_nocancel (pfd.fd) < 0)
> +	arc4random_getrandom_failure ();
> +      atomic_store_relaxed (&seen_initialized, 1);
>      }
> -  else
> -    state->count -= len;
> -}
>  
> -void
> -__arc4random_buf (void *buffer, size_t len)
> -{
> -  struct arc4random_state_t *state = arc4random_get_state ();
> -  if (__glibc_unlikely (state == NULL))
> -    {
> -      arc4random_getentropy (buffer, len);
> -      return;
> -    }
> -
> -  arc4random_check_stir (state, len);
> -  while (len > 0)
> +  fd = TEMP_FAILURE_RETRY (
> +      __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY));
> +  if (fd < 0)
> +    arc4random_getrandom_failure ();
> +  for (;;)
>      {
> -      if (state->have > 0)
> -	{
> -	  size_t m = MIN (len, state->have);
> -	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> -	  memcpy (buffer, ks, m);
> -	  explicit_bzero (ks, m);
> -	  buffer += m;
> -	  len -= m;
> -	  state->have -= m;
> -	}
> -      if (state->have == 0)
> -	arc4random_rekey (state, NULL, 0);
> +      l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n));
> +      if (l <= 0)
> +	arc4random_getrandom_failure ();
> +      if ((size_t) l == n)
> +	break; /* Done reading, success.  */
> +      p = (uint8_t *) p + l;
> +      n -= l;
>      }
> +  if (__close_nocancel (fd) < 0)
> +    arc4random_getrandom_failure ();
>  }
>  libc_hidden_def (__arc4random_buf)
>  weak_alias (__arc4random_buf, arc4random_buf)
> @@ -186,22 +95,7 @@ uint32_t
>  __arc4random (void)
>  {
>    uint32_t r;
> -
> -  struct arc4random_state_t *state = arc4random_get_state ();
> -  if (__glibc_unlikely (state == NULL))
> -    {
> -      arc4random_getentropy (&r, sizeof (uint32_t));
> -      return r;
> -    }
> -
> -  arc4random_check_stir (state, sizeof (uint32_t));
> -  if (state->have < sizeof (uint32_t))
> -    arc4random_rekey (state, NULL, 0);
> -  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> -  memcpy (&r, ks, sizeof (uint32_t));
> -  memset (ks, 0, sizeof (uint32_t));
> -  state->have -= sizeof (uint32_t);
> -
> +  __arc4random_buf (&r, sizeof (r));
>    return r;
>  }
>  libc_hidden_def (__arc4random)

Ok.

> diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
> deleted file mode 100644
> index cd39389c19..0000000000
> --- a/stdlib/arc4random.h
> +++ /dev/null
> @@ -1,48 +0,0 @@
> -/* Arc4random definition used on TLS.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#ifndef _CHACHA20_H
> -#define _CHACHA20_H
> -
> -#include <stddef.h>
> -#include <stdint.h>
> -
> -/* Internal ChaCha20 state.  */
> -#define CHACHA20_STATE_LEN	16
> -#define CHACHA20_BLOCK_SIZE	64
> -
> -/* Maximum number bytes until reseed (16 MB).  */
> -#define CHACHA20_RESEED_SIZE	(16 * 1024 * 1024)
> -
> -/* Internal arc4random buffer, used on each feedback step so offer some
> -   backtracking protection and to allow better used of vectorized
> -   chacha20 implementations.  */
> -#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> -
> -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE,
> -		"CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE");
> -
> -struct arc4random_state_t
> -{
> -  uint32_t ctx[CHACHA20_STATE_LEN];
> -  size_t have;
> -  size_t count;
> -  uint8_t buf[CHACHA20_BUFSIZE];
> -};
> -
> -#endif

Ok.

> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> deleted file mode 100644
> index 2745a81315..0000000000
> --- a/stdlib/chacha20.c
> +++ /dev/null
> @@ -1,191 +0,0 @@
> -/* Generic ChaCha20 implementation (used on arc4random).
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <array_length.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <string.h>
> -
> -/* 32-bit stream position, then 96-bit nonce.  */
> -#define CHACHA20_IV_SIZE	16
> -#define CHACHA20_KEY_SIZE	32
> -
> -#define CHACHA20_STATE_LEN	16
> -
> -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
> -   XOR of the keystream with the plaintext because the plaintext is a
> -   stream of zeros.  */
> -
> -enum chacha20_constants
> -{
> -  CHACHA20_CONSTANT_EXPA = 0x61707865U,
> -  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
> -  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
> -  CHACHA20_CONSTANT_TE_K = 0x6b206574U
> -};
> -
> -static inline uint32_t
> -read_unaligned_32 (const uint8_t *p)
> -{
> -  uint32_t r;
> -  memcpy (&r, p, sizeof (r));
> -  return r;
> -}
> -
> -static inline void
> -write_unaligned_32 (uint8_t *p, uint32_t v)
> -{
> -  memcpy (p, &v, sizeof (v));
> -}
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
> -# define set_state(v)		__builtin_bswap32 ((v))
> -#else
> -# define read_unaligned_le32(p) read_unaligned_32 ((p))
> -# define set_state(v)		(v)
> -#endif
> -
> -static inline void
> -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
> -{
> -  state[0]  = CHACHA20_CONSTANT_EXPA;
> -  state[1]  = CHACHA20_CONSTANT_ND_3;
> -  state[2]  = CHACHA20_CONSTANT_2_BY;
> -  state[3]  = CHACHA20_CONSTANT_TE_K;
> -
> -  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
> -  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
> -  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
> -  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
> -  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
> -  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
> -  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
> -  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
> -
> -  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
> -  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
> -  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
> -  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
> -}
> -
> -static inline uint32_t
> -rotl32 (unsigned int shift, uint32_t word)
> -{
> -  return (word << (shift & 31)) | (word >> ((-shift) & 31));
> -}
> -
> -static void
> -state_final (const uint8_t *src, uint8_t *dst, uint32_t v)
> -{
> -#ifdef CHACHA20_XOR_FINAL
> -  v ^= read_unaligned_32 (src);
> -#endif
> -  write_unaligned_32 (dst, v);
> -}
> -
> -static inline void
> -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src)
> -{
> -  uint32_t x0, x1, x2, x3, x4, x5, x6, x7;
> -  uint32_t x8, x9, x10, x11, x12, x13, x14, x15;
> -
> -  x0 = state[0];
> -  x1 = state[1];
> -  x2 = state[2];
> -  x3 = state[3];
> -  x4 = state[4];
> -  x5 = state[5];
> -  x6 = state[6];
> -  x7 = state[7];
> -  x8 = state[8];
> -  x9 = state[9];
> -  x10 = state[10];
> -  x11 = state[11];
> -  x12 = state[12];
> -  x13 = state[13];
> -  x14 = state[14];
> -  x15 = state[15];
> -
> -  for (int i = 0; i < 20; i += 2)
> -    {
> -#define QROUND(_x0, _x1, _x2, _x3) 			\
> -  do {							\
> -   _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); 	\
> -   _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); 	\
> -   _x0 = _x0 + _x1; _x3 = rotl32 (8,  (_x0 ^ _x3));	\
> -   _x2 = _x2 + _x3; _x1 = rotl32 (7,  (_x1 ^ _x2));	\
> -  } while(0)
> -
> -      QROUND (x0, x4, x8,  x12);
> -      QROUND (x1, x5, x9,  x13);
> -      QROUND (x2, x6, x10, x14);
> -      QROUND (x3, x7, x11, x15);
> -
> -      QROUND (x0, x5, x10, x15);
> -      QROUND (x1, x6, x11, x12);
> -      QROUND (x2, x7, x8,  x13);
> -      QROUND (x3, x4, x9,  x14);
> -    }
> -
> -  state_final (&src[0], &dst[0], set_state (x0 + state[0]));
> -  state_final (&src[4], &dst[4], set_state (x1 + state[1]));
> -  state_final (&src[8], &dst[8], set_state (x2 + state[2]));
> -  state_final (&src[12], &dst[12], set_state (x3 + state[3]));
> -  state_final (&src[16], &dst[16], set_state (x4 + state[4]));
> -  state_final (&src[20], &dst[20], set_state (x5 + state[5]));
> -  state_final (&src[24], &dst[24], set_state (x6 + state[6]));
> -  state_final (&src[28], &dst[28], set_state (x7 + state[7]));
> -  state_final (&src[32], &dst[32], set_state (x8 + state[8]));
> -  state_final (&src[36], &dst[36], set_state (x9 + state[9]));
> -  state_final (&src[40], &dst[40], set_state (x10 + state[10]));
> -  state_final (&src[44], &dst[44], set_state (x11 + state[11]));
> -  state_final (&src[48], &dst[48], set_state (x12 + state[12]));
> -  state_final (&src[52], &dst[52], set_state (x13 + state[13]));
> -  state_final (&src[56], &dst[56], set_state (x14 + state[14]));
> -  state_final (&src[60], &dst[60], set_state (x15 + state[15]));
> -
> -  state[12]++;
> -}
> -
> -static void
> -__attribute_maybe_unused__
> -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -			size_t bytes)
> -{
> -  while (bytes >= CHACHA20_BLOCK_SIZE)
> -    {
> -      chacha20_block (state, dst, src);
> -
> -      bytes -= CHACHA20_BLOCK_SIZE;
> -      dst += CHACHA20_BLOCK_SIZE;
> -      src += CHACHA20_BLOCK_SIZE;
> -    }
> -
> -  if (__glibc_unlikely (bytes != 0))
> -    {
> -      uint8_t stream[CHACHA20_BLOCK_SIZE];
> -      chacha20_block (state, stream, src);
> -      memcpy (dst, stream, bytes);
> -      explicit_bzero (stream, sizeof stream);
> -    }
> -}
> -
> -/* Get the architecture optimized version.  */
> -#include <chacha20_arch.h>
> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
> deleted file mode 100644
> index 45ba54920d..0000000000
> --- a/stdlib/tst-arc4random-chacha20.c
> +++ /dev/null
> @@ -1,167 +0,0 @@
> -/* Basic tests for chacha20 cypher used in arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <arc4random.h>
> -#include <support/check.h>
> -#include <sys/cdefs.h>
> -
> -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random
> -   actual does.  */
> -#include <chacha20.c>
> -
> -static int
> -do_test (void)
> -{
> -  const uint8_t key[CHACHA20_KEY_SIZE] =
> -    {
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -    };
> -  const uint8_t iv[CHACHA20_IV_SIZE] =
> -    {
> -      0x0, 0x0, 0x0, 0x0,
> -      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> -    };
> -  const uint8_t expected1[CHACHA20_BUFSIZE] =
> -    {
> -      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
> -      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
> -      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
> -      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
> -      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
> -      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
> -      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
> -      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
> -      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
> -      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
> -      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
> -      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
> -      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
> -      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
> -      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
> -      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
> -      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
> -      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
> -      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
> -      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
> -      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
> -      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
> -      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
> -      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
> -      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
> -      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
> -      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
> -      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
> -      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
> -      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
> -      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
> -      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
> -      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
> -      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
> -      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
> -      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
> -      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
> -      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
> -      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
> -      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
> -      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
> -      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
> -      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
> -      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
> -      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
> -      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
> -      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
> -    };
> -
> -  const uint8_t expected2[CHACHA20_BUFSIZE] =
> -    {
> -      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
> -      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
> -      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
> -      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
> -      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
> -      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
> -      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
> -      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
> -      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
> -      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
> -      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
> -      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
> -      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
> -      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
> -      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
> -      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
> -      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
> -      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
> -      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
> -      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
> -      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
> -      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
> -      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
> -      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
> -      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
> -      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
> -      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
> -      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
> -      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
> -      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
> -      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
> -      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
> -      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
> -      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
> -      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
> -      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
> -      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
> -      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
> -      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
> -      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
> -      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
> -      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
> -      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
> -      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
> -      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
> -      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
> -      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
> -    };
> -
> -  /* Check with the expected internal arc4random keystream buffer.  Some
> -     architecture optimizations expects a buffer with a minimum size which
> -     is a multiple of then ChaCha20 blocksize, so they might not be prepared
> -     to handle smaller buffers.  */
> -
> -  uint8_t output[CHACHA20_BUFSIZE];
> -
> -  uint32_t state[CHACHA20_STATE_LEN];
> -  chacha20_init (state, key, iv);
> -
> -  /* Check with the initial state.  */
> -  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
> -
> -  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> -  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
> -
> -  /* And on the next round.  */
> -  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
> -  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
> -
> -  return 0;
> -}
> -
> -#include <support/test-driver.c>
> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
> index 7dfd1b62dd..17fb1c5b72 100644
> --- a/sysdeps/aarch64/Makefile
> +++ b/sysdeps/aarch64/Makefile
> @@ -51,10 +51,6 @@ ifeq ($(subdir),csu)
>  gen-as-const-headers += tlsdesc.sym
>  endif
>  
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-aarch64
> -endif
> -
>  ifeq ($(subdir),gmon)
>  CFLAGS-mcount.c += -mgeneral-regs-only
>  endif
> diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S
> deleted file mode 100644
> index cce5291c5c..0000000000
> --- a/sysdeps/aarch64/chacha20-aarch64.S
> +++ /dev/null
> @@ -1,314 +0,0 @@
> -/* Optimized AArch64 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -
> -/* Only LE is supported.  */
> -#ifdef __AARCH64EL__
> -
> -#define GET_DATA_POINTER(reg, name) \
> -        adrp    reg, name ; \
> -        add     reg, reg, :lo12:name
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> -        ret; dsb sy; isb;
> -
> -.cpu generic+simd
> -
> -.text
> -
> -/* register macros */
> -#define INPUT     x0
> -#define DST       x1
> -#define SRC       x2
> -#define NBLKS     x3
> -#define ROUND     x4
> -#define INPUT_CTR x5
> -#define INPUT_POS x6
> -#define CTR       x7
> -
> -/* vector registers */
> -#define X0 v16
> -#define X4 v17
> -#define X8 v18
> -#define X12 v19
> -
> -#define X1 v20
> -#define X5 v21
> -
> -#define X9 v22
> -#define X13 v23
> -#define X2 v24
> -#define X6 v25
> -
> -#define X3 v26
> -#define X7 v27
> -#define X11 v28
> -#define X15 v29
> -
> -#define X10 v30
> -#define X14 v31
> -
> -#define VCTR    v0
> -#define VTMP0   v1
> -#define VTMP1   v2
> -#define VTMP2   v3
> -#define VTMP3   v4
> -#define X12_TMP v5
> -#define X13_TMP v6
> -#define ROT8    v7
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -#define _(...) __VA_ARGS__
> -
> -#define vpunpckldq(s1, s2, dst) \
> -	zip1 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpckhdq(s1, s2, dst) \
> -	zip2 dst.4s, s2.4s, s1.4s;
> -
> -#define vpunpcklqdq(s1, s2, dst) \
> -	zip1 dst.2d, s2.2d, s1.2d;
> -
> -#define vpunpckhqdq(s1, s2, dst) \
> -	zip2 dst.2d, s2.2d, s1.2d;
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
> -	vpunpckhdq(x1, x0, t2); \
> -	vpunpckldq(x1, x0, x0); \
> -	\
> -	vpunpckldq(x3, x2, t1); \
> -	vpunpckhdq(x3, x2, x2); \
> -	\
> -	vpunpckhqdq(t1, x0, x1); \
> -	vpunpcklqdq(t1, x0, x0); \
> -	\
> -	vpunpckhqdq(x2, t2, x3); \
> -	vpunpcklqdq(x2, t2, x2);
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define XOR(d,s1,s2) \
> -	eor d.16b, s2.16b, s1.16b;
> -
> -#define PLUS(ds,s) \
> -	add ds.4s, ds.4s, s.4s;
> -
> -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
> -	shl dst1.4s, src1.4s, #(c);		\
> -	shl dst2.4s, src2.4s, #(c);		\
> -	shl dst3.4s, src3.4s, #(c);		\
> -	shl dst4.4s, src4.4s, #(c);		\
> -	sri dst1.4s, src1.4s, #(32 - (c));	\
> -	sri dst2.4s, src2.4s, #(32 - (c));	\
> -	sri dst3.4s, src3.4s, #(32 - (c));	\
> -	sri dst4.4s, src4.4s, #(32 - (c));
> -
> -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> -	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
> -	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
> -	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
> -	tbl dst4.16b, {src4.16b}, ROT8.16b;
> -
> -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
> -	rev32 dst1.8h, src1.8h;			\
> -	rev32 dst2.8h, src2.8h;			\
> -	rev32 dst3.8h, src3.8h;			\
> -	rev32 dst4.8h, src4.8h;
> -
> -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
> -	PLUS(a1,b1); PLUS(a2,b2);						\
> -	PLUS(a3,b3); PLUS(a4,b4);						\
> -	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
> -	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
> -		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
> -	PLUS(c1,d1); PLUS(c2,d2);						\
> -	PLUS(c3,d3); PLUS(c4,d4);						\
> -	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
> -	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
> -		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
> -	PLUS(a1,b1); PLUS(a2,b2);						\
> -	PLUS(a3,b3); PLUS(a4,b4);						\
> -	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
> -	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
> -		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
> -	PLUS(c1,d1); PLUS(c2,d2);						\
> -	PLUS(c3,d3); PLUS(c4,d4);						\
> -	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
> -	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
> -		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
> -
> -.align 4
> -L(__chacha20_blocks4_data_inc_counter):
> -	.long 0,1,2,3
> -
> -.align 4
> -L(__chacha20_blocks4_data_rot8):
> -	.byte 3,0,1,2
> -	.byte 7,4,5,6
> -	.byte 11,8,9,10
> -	.byte 15,12,13,14
> -
> -.hidden __chacha20_neon_blocks4
> -ENTRY (__chacha20_neon_blocks4)
> -	/* input:
> -	 *	x0: input
> -	 *	x1: dst
> -	 *	x2: src
> -	 *	x3: nblks (multiple of 4)
> -	 */
> -
> -	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
> -	add INPUT_CTR, INPUT, #(12*4);
> -	ld1 {ROT8.16b}, [CTR];
> -	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
> -	mov INPUT_POS, INPUT;
> -	ld1 {VCTR.16b}, [CTR];
> -
> -L(loop4):
> -	/* Construct counter vectors X12 and X13 */
> -
> -	ld1 {X15.16b}, [INPUT_CTR];
> -	mov ROUND, #20;
> -	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
> -
> -	dup X12.4s, X15.s[0];
> -	dup X13.4s, X15.s[1];
> -	ldr CTR, [INPUT_CTR];
> -	add X12.4s, X12.4s, VCTR.4s;
> -	dup X0.4s, VTMP1.s[0];
> -	dup X1.4s, VTMP1.s[1];
> -	dup X2.4s, VTMP1.s[2];
> -	dup X3.4s, VTMP1.s[3];
> -	dup X14.4s, X15.s[2];
> -	cmhi VTMP0.4s, VCTR.4s, X12.4s;
> -	dup X15.4s, X15.s[3];
> -	add CTR, CTR, #4; /* Update counter */
> -	dup X4.4s, VTMP2.s[0];
> -	dup X5.4s, VTMP2.s[1];
> -	dup X6.4s, VTMP2.s[2];
> -	dup X7.4s, VTMP2.s[3];
> -	sub X13.4s, X13.4s, VTMP0.4s;
> -	dup X8.4s, VTMP3.s[0];
> -	dup X9.4s, VTMP3.s[1];
> -	dup X10.4s, VTMP3.s[2];
> -	dup X11.4s, VTMP3.s[3];
> -	mov X12_TMP.16b, X12.16b;
> -	mov X13_TMP.16b, X13.16b;
> -	str CTR, [INPUT_CTR];
> -
> -L(round2):
> -	subs ROUND, ROUND, #2
> -	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
> -		      X2, X6, X10, X14,   X3, X7, X11, X15,
> -		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> -	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
> -		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
> -		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
> -	b.ne L(round2);
> -
> -	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
> -
> -	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
> -	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
> -
> -	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
> -	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
> -	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
> -	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
> -	PLUS(X0, VTMP2);
> -	PLUS(X1, VTMP3);
> -	PLUS(X2, X12_TMP);
> -	PLUS(X3, X13_TMP);
> -
> -	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
> -	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
> -	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
> -	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
> -	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
> -	mov INPUT_POS, INPUT;
> -	PLUS(X4, VTMP2);
> -	PLUS(X5, VTMP3);
> -	PLUS(X6, X12_TMP);
> -	PLUS(X7, X13_TMP);
> -
> -	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
> -	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
> -	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
> -	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
> -	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
> -	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
> -	PLUS(X8, VTMP2);
> -	PLUS(X9, VTMP3);
> -	PLUS(X10, X12_TMP);
> -	PLUS(X11, X13_TMP);
> -	PLUS(X14, VTMP0);
> -	PLUS(X15, VTMP1);
> -
> -	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
> -	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
> -
> -	subs NBLKS, NBLKS, #4;
> -
> -	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
> -	st1 {X1.16b,X5.16b}, [DST], #32;
> -	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
> -	st1 {X10.16b,X14.16b}, [DST], #32;
> -	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
> -
> -	b.ne L(loop4);
> -
> -	ret_spec_stop
> -END (__chacha20_neon_blocks4)
> -
> -#endif
> diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
> deleted file mode 100644
> index 37dbb917f1..0000000000
> --- a/sysdeps/aarch64/chacha20_arch.h
> +++ /dev/null
> @@ -1,40 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <ldsodefs.h>
> -#include <stdbool.h>
> -
> -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
> -#ifdef __AARCH64EL__
> -  __chacha20_neon_blocks4 (state, dst, src,
> -			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -#endif
> -}
> diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
> deleted file mode 100644
> index 1b4559ccbc..0000000000
> --- a/sysdeps/generic/chacha20_arch.h
> +++ /dev/null
> @@ -1,24 +0,0 @@
> -/* Chacha20 implementation, generic interface for encrypt.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
> index acceb9b67f..b5a42c70d6 100644
> --- a/sysdeps/generic/not-cancel.h
> +++ b/sysdeps/generic/not-cancel.h
> @@ -20,6 +20,7 @@
>  # define NOT_CANCEL_H
>  
>  #include <fcntl.h>
> +#include <poll.h>
>  #include <unistd.h>
>  #include <sys/wait.h>
>  #include <time.h>
> @@ -50,5 +51,7 @@
>    __fcntl64 (fd, cmd, __VA_ARGS__)
>  #define __getrandom_nocancel(buf, size, flags) \
>    __getrandom (buf, size, flags)
> +#define __poll_infinity_nocancel(fds, nfds) \
> +  __poll (fds, nfds, -1)
>  
>  #endif /* NOT_CANCEL_H  */
> diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
> index a91915831b..d76c715a96 100644
> --- a/sysdeps/generic/tls-internal-struct.h
> +++ b/sysdeps/generic/tls-internal-struct.h
> @@ -23,7 +23,6 @@ struct tls_internal_t
>  {
>    char *strsignal_buf;
>    char *strerror_l_buf;
> -  struct arc4random_state_t *rand_state;
>  };
>  
>  #endif

Ok.

> diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
> index 8a0f37d509..b32b31b5a9 100644
> --- a/sysdeps/generic/tls-internal.c
> +++ b/sysdeps/generic/tls-internal.c
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <stdlib/arc4random.h>
>  #include <string.h>
>  #include <tls-internal.h>
>  
> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void)
>  {
>    free (__tls_internal.strsignal_buf);
>    free (__tls_internal.strerror_l_buf);
> -
> -  if (__tls_internal.rand_state != NULL)
> -    {
> -      /* Clear any lingering random state prior so if the thread stack is
> -	 cached it won't leak any data.  */
> -      explicit_bzero (__tls_internal.rand_state,
> -		      sizeof (*__tls_internal.rand_state));
> -      free (__tls_internal.rand_state);
> -    }
>  }

Ok.

> diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
> index 667068c8cf..e60b86fab1 100644
> --- a/sysdeps/mach/hurd/_Fork.c
> +++ b/sysdeps/mach/hurd/_Fork.c
> @@ -662,8 +662,6 @@ retry:
>        _hurd_malloc_fork_child ();
>        call_function_static_weak (__malloc_fork_unlock_child);
>  
> -      call_function_static_weak (__arc4random_fork_subprocess);
> -
>        /* Run things that want to run in the child task to set up.  */
>        RUN_HOOK (_hurd_fork_child_hook, ());
>  

Ok.

> diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
> index 9a3a7ed59a..ae58b734e3 100644
> --- a/sysdeps/mach/hurd/not-cancel.h
> +++ b/sysdeps/mach/hurd/not-cancel.h
> @@ -21,6 +21,7 @@
>  
>  #include <fcntl.h>
>  #include <unistd.h>
> +#include <poll.h>
>  #include <sys/wait.h>
>  #include <time.h>
>  #include <sys/uio.h>
> @@ -77,6 +78,9 @@ __typeof (__fcntl) __fcntl_nocancel;
>  #define __getrandom_nocancel(buf, size, flags) \
>    __getrandom (buf, size, flags)
>  
> +#define __poll_infinity_nocancel(fds, nfds) \
> +  __poll (fds, nfds, -1)
> +
>  #if IS_IN (libc)
>  hidden_proto (__close_nocancel)
>  hidden_proto (__close_nocancel_nostatus)
> diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c
> index 7dc02569f6..dd568992e2 100644
> --- a/sysdeps/nptl/_Fork.c
> +++ b/sysdeps/nptl/_Fork.c
> @@ -43,8 +43,6 @@ _Fork (void)
>        self->robust_head.list = &self->robust_head;
>        INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head,
>  			     sizeof (struct robust_list_head));
> -
> -      call_function_static_weak (__arc4random_fork_subprocess);
>      }
>    return pid;
>  }

Ok.

> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> deleted file mode 100644
> index 8c75165f7f..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> deleted file mode 100644
> index cf9e735326..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> +++ /dev/null
> @@ -1 +0,0 @@
> -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> deleted file mode 100644
> index 08494dc045..0000000000
> --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> -					const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> -		const uint8_t *src, size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> -  unsigned long int hwcap = GLRO(dl_hwcap);
> -  unsigned long int hwcap2 = GLRO(dl_hwcap2);
> -  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> -    __chacha20_power8_blocks4 (state, dst, src,
> -			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -  else
> -    chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
> index abb0aa3f11..71a59529f3 100644
> --- a/sysdeps/powerpc/powerpc64/power8/Makefile
> +++ b/sysdeps/powerpc/powerpc64/power8/Makefile
> @@ -1,8 +1,3 @@
>  ifeq ($(subdir),string)
>  sysdep_routines += strcasestr-ppc64
>  endif
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -endif

Ok.

> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> deleted file mode 100644
> index 0bbdcb9363..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
> +++ /dev/null
> @@ -1,256 +0,0 @@
> -/* Optimized PowerPC implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20
> -   Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <altivec.h>
> -#include <endian.h>
> -#include <stddef.h>
> -#include <stdint.h>
> -#include <sys/cdefs.h>
> -
> -typedef vector unsigned char vector16x_u8;
> -typedef vector unsigned int vector4x_u32;
> -typedef vector unsigned long long vector2x_u64;
> -
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -static const vector16x_u8 le_bswap_const =
> -  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
> -#endif
> -
> -static inline vector4x_u32
> -vec_rol_elems (vector4x_u32 v, unsigned int idx)
> -{
> -#if __BYTE_ORDER != __BIG_ENDIAN
> -  return vec_sld (v, v, (16 - (4 * idx)) & 15);
> -#else
> -  return vec_sld (v, v, (4 * idx) & 15);
> -#endif
> -}
> -
> -static inline vector4x_u32
> -vec_load_le (unsigned long offset, const unsigned char *ptr)
> -{
> -  vector4x_u32 vec;
> -  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
> -				 le_bswap_const);
> -#endif
> -  return vec;
> -}
> -
> -static inline void
> -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
> -			       le_bswap_const);
> -#endif
> -  vec_vsx_st (vec, offset, (uint32_t *)ptr);
> -}
> -
> -
> -static inline vector4x_u32
> -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
> -{
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -  static const vector16x_u8 swap32 =
> -    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
> -  vector2x_u64 vec, add, sum;
> -
> -  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
> -  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
> -  sum = vec + add;
> -  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
> -#else
> -  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
> -#endif
> -}
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE(v1,rolv)			\
> -	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
> -
> -#define PLUS(ds,s) \
> -	((ds) += (s))
> -
> -#define XOR(ds,s) \
> -	((ds) ^= (s))
> -
> -#define ADD_U64(v,a) \
> -	(v = vec_add_ctr_u64(v, a))
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0, x1, x2, x3) ({ \
> -	vector4x_u32 t1 = vec_mergeh(x0, x2); \
> -	vector4x_u32 t2 = vec_mergel(x0, x2); \
> -	vector4x_u32 t3 = vec_mergeh(x1, x3); \
> -	x3 = vec_mergel(x1, x3); \
> -	x0 = vec_mergeh(t1, t3); \
> -	x1 = vec_mergel(t1, t3); \
> -	x2 = vec_mergeh(t2, x3); \
> -	x3 = vec_mergel(t2, x3); \
> -      })
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
> -
> -unsigned int attribute_hidden
> -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -			   size_t nblks)
> -{
> -  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
> -  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
> -  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
> -  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
> -  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
> -  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
> -  vector4x_u32 state0, state1, state2, state3;
> -  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
> -  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
> -  vector4x_u32 tmp;
> -  int i;
> -
> -  /* Force preload of constants to vector registers.  */
> -  __asm__ ("": "+v" (counters_0123) :: "memory");
> -  __asm__ ("": "+v" (counter_4) :: "memory");
> -  __asm__ ("": "+v" (rotate_16) :: "memory");
> -  __asm__ ("": "+v" (rotate_12) :: "memory");
> -  __asm__ ("": "+v" (rotate_8) :: "memory");
> -  __asm__ ("": "+v" (rotate_7) :: "memory");
> -
> -  state0 = vec_vsx_ld (0 * 16, state);
> -  state1 = vec_vsx_ld (1 * 16, state);
> -  state2 = vec_vsx_ld (2 * 16, state);
> -  state3 = vec_vsx_ld (3 * 16, state);
> -
> -  do
> -    {
> -      v0 = vec_splat (state0, 0);
> -      v1 = vec_splat (state0, 1);
> -      v2 = vec_splat (state0, 2);
> -      v3 = vec_splat (state0, 3);
> -      v4 = vec_splat (state1, 0);
> -      v5 = vec_splat (state1, 1);
> -      v6 = vec_splat (state1, 2);
> -      v7 = vec_splat (state1, 3);
> -      v8 = vec_splat (state2, 0);
> -      v9 = vec_splat (state2, 1);
> -      v10 = vec_splat (state2, 2);
> -      v11 = vec_splat (state2, 3);
> -      v12 = vec_splat (state3, 0);
> -      v13 = vec_splat (state3, 1);
> -      v14 = vec_splat (state3, 2);
> -      v15 = vec_splat (state3, 3);
> -
> -      v12 += counters_0123;
> -      v13 -= vec_cmplt (v12, counters_0123);
> -
> -      for (i = 20; i > 0; i -= 2)
> -	{
> -	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
> -	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
> -	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
> -	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
> -	}
> -
> -      v0 += vec_splat (state0, 0);
> -      v1 += vec_splat (state0, 1);
> -      v2 += vec_splat (state0, 2);
> -      v3 += vec_splat (state0, 3);
> -      v4 += vec_splat (state1, 0);
> -      v5 += vec_splat (state1, 1);
> -      v6 += vec_splat (state1, 2);
> -      v7 += vec_splat (state1, 3);
> -      v8 += vec_splat (state2, 0);
> -      v9 += vec_splat (state2, 1);
> -      v10 += vec_splat (state2, 2);
> -      v11 += vec_splat (state2, 3);
> -      tmp = vec_splat( state3, 0);
> -      tmp += counters_0123;
> -      v12 += tmp;
> -      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
> -      v14 += vec_splat (state3, 2);
> -      v15 += vec_splat (state3, 3);
> -      ADD_U64 (state3, counter_4);
> -
> -      transpose_4x4 (v0, v1, v2, v3);
> -      transpose_4x4 (v4, v5, v6, v7);
> -      transpose_4x4 (v8, v9, v10, v11);
> -      transpose_4x4 (v12, v13, v14, v15);
> -
> -      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
> -      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
> -      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
> -      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
> -
> -      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
> -      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
> -      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
> -      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
> -
> -      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
> -      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
> -      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
> -      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
> -
> -      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
> -      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
> -      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
> -      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
> -
> -      src += 4*64;
> -      dst += 4*64;
> -
> -      nblks -= 4;
> -    }
> -  while (nblks);
> -
> -  vec_vsx_st (state3, 3 * 16, state);
> -
> -  return 0;
> -}
> diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> deleted file mode 100644
> index ded06762b6..0000000000
> --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -/* PowerPC optimization for ChaCha20.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -
> -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> -					const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static void
> -chacha20_crypt (uint32_t *state, uint8_t *dst,
> -		const uint8_t *src, size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> -
> -  __chacha20_power8_blocks4 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -}
> diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
> index 96c110f490..66ed844e68 100644
> --- a/sysdeps/s390/s390-64/Makefile
> +++ b/sysdeps/s390/s390-64/Makefile
> @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache
>  endif
>  
>  endif # $(subdir) == elf
> -
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> -  chacha20-s390x \
> -  # sysdep_routines
> -endif
> diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S
> deleted file mode 100644
> index e38504d370..0000000000
> --- a/sysdeps/s390/s390-64/chacha20-s390x.S
> +++ /dev/null
> @@ -1,573 +0,0 @@
> -/* Optimized s390x implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-s390x.S  -  zSeries implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> - */
> -
> -#include <sysdep.h>
> -
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> -
> -/* CFA expressions are used for pointing CFA and registers to
> - * SP relative offsets. */
> -# define DW_REGNO_SP 15
> -
> -/* Fixed length encoding used for integers for now. */
> -# define DW_SLEB128_7BIT(value) \
> -        0x00|((value) & 0x7f)
> -# define DW_SLEB128_28BIT(value) \
> -        0x80|((value)&0x7f), \
> -        0x80|(((value)>>7)&0x7f), \
> -        0x80|(((value)>>14)&0x7f), \
> -        0x00|(((value)>>21)&0x7f)
> -
> -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
> -        .cfi_escape \
> -          0x0f, /* DW_CFA_def_cfa_expression */ \
> -            DW_SLEB128_7BIT(11), /* length */ \
> -          0x7f, /* DW_OP_breg15, rsp + constant */ \
> -            DW_SLEB128_28BIT(rsp_offs), \
> -          0x06, /* DW_OP_deref */ \
> -          0x23, /* DW_OP_plus_constu */ \
> -            DW_SLEB128_28BIT((cfa_depth)+160)
> -
> -.machine "z13+vx"
> -.text
> -
> -.balign 16
> -.Lconsts:
> -.Lwordswap:
> -	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
> -.Lbswap128:
> -	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
> -.Lbswap32:
> -	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
> -.Lone:
> -	.long 0, 0, 0, 1
> -.Ladd_counter_0123:
> -	.long 0, 1, 2, 3
> -.Ladd_counter_4567:
> -	.long 4, 5, 6, 7
> -
> -/* register macros */
> -#define INPUT %r2
> -#define DST   %r3
> -#define SRC   %r4
> -#define NBLKS %r0
> -#define ROUND %r1
> -
> -/* stack structure */
> -
> -#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
> -#define STACK_FRAME_F8_F15 (8 * 8)
> -#define STACK_FRAME_Y0_Y15 (16 * 16)
> -#define STACK_FRAME_CTR    (4 * 16)
> -#define STACK_FRAME_PARAMS (6 * 8)
> -
> -#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
> -		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
> -		     STACK_FRAME_PARAMS)
> -
> -#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
> -#define STACK_F9     (STACK_F8 + 8)
> -#define STACK_F10    (STACK_F9 + 8)
> -#define STACK_F11    (STACK_F10 + 8)
> -#define STACK_F12    (STACK_F11 + 8)
> -#define STACK_F13    (STACK_F12 + 8)
> -#define STACK_F14    (STACK_F13 + 8)
> -#define STACK_F15    (STACK_F14 + 8)
> -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
> -#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
> -#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
> -#define STACK_DST    (STACK_INPUT + 8)
> -#define STACK_SRC    (STACK_DST + 8)
> -#define STACK_NBLKS  (STACK_SRC + 8)
> -#define STACK_POCTX  (STACK_NBLKS + 8)
> -#define STACK_POSRC  (STACK_POCTX + 8)
> -
> -#define STACK_G0_H3  STACK_Y0_Y15
> -
> -/* vector registers */
> -#define A0 %v0
> -#define A1 %v1
> -#define A2 %v2
> -#define A3 %v3
> -
> -#define B0 %v4
> -#define B1 %v5
> -#define B2 %v6
> -#define B3 %v7
> -
> -#define C0 %v8
> -#define C1 %v9
> -#define C2 %v10
> -#define C3 %v11
> -
> -#define D0 %v12
> -#define D1 %v13
> -#define D2 %v14
> -#define D3 %v15
> -
> -#define E0 %v16
> -#define E1 %v17
> -#define E2 %v18
> -#define E3 %v19
> -
> -#define F0 %v20
> -#define F1 %v21
> -#define F2 %v22
> -#define F3 %v23
> -
> -#define G0 %v24
> -#define G1 %v25
> -#define G2 %v26
> -#define G3 %v27
> -
> -#define H0 %v28
> -#define H1 %v29
> -#define H2 %v30
> -#define H3 %v31
> -
> -#define IO0 E0
> -#define IO1 E1
> -#define IO2 E2
> -#define IO3 E3
> -#define IO4 F0
> -#define IO5 F1
> -#define IO6 F2
> -#define IO7 F3
> -
> -#define S0 G0
> -#define S1 G1
> -#define S2 G2
> -#define S3 G3
> -
> -#define TMP0 H0
> -#define TMP1 H1
> -#define TMP2 H2
> -#define TMP3 H3
> -
> -#define X0 A0
> -#define X1 A1
> -#define X2 A2
> -#define X3 A3
> -#define X4 B0
> -#define X5 B1
> -#define X6 B2
> -#define X7 B3
> -#define X8 C0
> -#define X9 C1
> -#define X10 C2
> -#define X11 C3
> -#define X12 D0
> -#define X13 D1
> -#define X14 D2
> -#define X15 D3
> -
> -#define Y0 E0
> -#define Y1 E1
> -#define Y2 E2
> -#define Y3 E3
> -#define Y4 F0
> -#define Y5 F1
> -#define Y6 F2
> -#define Y7 F3
> -#define Y8 G0
> -#define Y9 G1
> -#define Y10 G2
> -#define Y11 G3
> -#define Y12 H0
> -#define Y13 H1
> -#define Y14 H2
> -#define Y15 H3
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -#define _ /*_*/
> -
> -#define START_STACK(last_r) \
> -	lgr %r0, %r15; \
> -	lghi %r1, ~15; \
> -	stmg %r6, last_r, 6 * 8(%r15); \
> -	aghi %r0, -STACK_MAX; \
> -	ngr %r0, %r1; \
> -	lgr %r1, %r15; \
> -	cfi_def_cfa_register(1); \
> -	lgr %r15, %r0; \
> -	stg %r1, 0(%r15); \
> -	cfi_cfa_on_stack(0, 0); \
> -	std %f8, STACK_F8(%r15); \
> -	std %f9, STACK_F9(%r15); \
> -	std %f10, STACK_F10(%r15); \
> -	std %f11, STACK_F11(%r15); \
> -	std %f12, STACK_F12(%r15); \
> -	std %f13, STACK_F13(%r15); \
> -	std %f14, STACK_F14(%r15); \
> -	std %f15, STACK_F15(%r15);
> -
> -#define END_STACK(last_r) \
> -	lg %r1, 0(%r15); \
> -	ld %f8, STACK_F8(%r15); \
> -	ld %f9, STACK_F9(%r15); \
> -	ld %f10, STACK_F10(%r15); \
> -	ld %f11, STACK_F11(%r15); \
> -	ld %f12, STACK_F12(%r15); \
> -	ld %f13, STACK_F13(%r15); \
> -	ld %f14, STACK_F14(%r15); \
> -	ld %f15, STACK_F15(%r15); \
> -	lmg %r6, last_r, 6 * 8(%r1); \
> -	lgr %r15, %r1; \
> -	cfi_def_cfa_register(DW_REGNO_SP);
> -
> -#define PLUS(dst,src) \
> -	vaf dst, dst, src;
> -
> -#define XOR(dst,src) \
> -	vx dst, dst, src;
> -
> -#define ROTATE(v1,c) \
> -	verllf v1, v1, (c)(0);
> -
> -#define WORD_ROTATE(v1,s) \
> -	vsldb v1, v1, v1, ((s) * 4);
> -
> -#define DST_8(OPER, I, J) \
> -	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
> -	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
> -
> -/**********************************************************************
> -  round macros
> - **********************************************************************/
> -
> -/**********************************************************************
> -  8-way chacha20 ("vertical")
> - **********************************************************************/
> -
> -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> -			      x8,x9,x10,x11,x12,x13,x14,x15,\
> -			      y0,y1,y2,y3,y4,y5,y6,y7,\
> -			      y8,y9,y10,y11,y12,y13,y14,y15,\
> -			      op1,op2,op3,op4,op5,op6,op7,op8,\
> -			      op9,op10,op11,op12) \
> -	op1;							\
> -	PLUS(x0, x1); PLUS(x4, x5);				\
> -	PLUS(x8, x9); PLUS(x12, x13);				\
> -	PLUS(y0, y1); PLUS(y4, y5);				\
> -	PLUS(y8, y9); PLUS(y12, y13);				\
> -	    op2;						\
> -	    XOR(x3, x0);  XOR(x7, x4);				\
> -	    XOR(x11, x8); XOR(x15, x12);			\
> -	    XOR(y3, y0);  XOR(y7, y4);				\
> -	    XOR(y11, y8); XOR(y15, y12);			\
> -		op3;						\
> -		ROTATE(x3, 16); ROTATE(x7, 16);			\
> -		ROTATE(x11, 16); ROTATE(x15, 16);		\
> -		ROTATE(y3, 16); ROTATE(y7, 16);			\
> -		ROTATE(y11, 16); ROTATE(y15, 16);		\
> -	op4;							\
> -	PLUS(x2, x3); PLUS(x6, x7);				\
> -	PLUS(x10, x11); PLUS(x14, x15);				\
> -	PLUS(y2, y3); PLUS(y6, y7);				\
> -	PLUS(y10, y11); PLUS(y14, y15);				\
> -	    op5;						\
> -	    XOR(x1, x2); XOR(x5, x6);				\
> -	    XOR(x9, x10); XOR(x13, x14);			\
> -	    XOR(y1, y2); XOR(y5, y6);				\
> -	    XOR(y9, y10); XOR(y13, y14);			\
> -		op6;						\
> -		ROTATE(x1,12); ROTATE(x5,12);			\
> -		ROTATE(x9,12); ROTATE(x13,12);			\
> -		ROTATE(y1,12); ROTATE(y5,12);			\
> -		ROTATE(y9,12); ROTATE(y13,12);			\
> -	op7;							\
> -	PLUS(x0, x1); PLUS(x4, x5);				\
> -	PLUS(x8, x9); PLUS(x12, x13);				\
> -	PLUS(y0, y1); PLUS(y4, y5);				\
> -	PLUS(y8, y9); PLUS(y12, y13);				\
> -	    op8;						\
> -	    XOR(x3, x0); XOR(x7, x4);				\
> -	    XOR(x11, x8); XOR(x15, x12);			\
> -	    XOR(y3, y0); XOR(y7, y4);				\
> -	    XOR(y11, y8); XOR(y15, y12);			\
> -		op9;						\
> -		ROTATE(x3,8); ROTATE(x7,8);			\
> -		ROTATE(x11,8); ROTATE(x15,8);			\
> -		ROTATE(y3,8); ROTATE(y7,8);			\
> -		ROTATE(y11,8); ROTATE(y15,8);			\
> -	op10;							\
> -	PLUS(x2, x3); PLUS(x6, x7);				\
> -	PLUS(x10, x11); PLUS(x14, x15);				\
> -	PLUS(y2, y3); PLUS(y6, y7);				\
> -	PLUS(y10, y11); PLUS(y14, y15);				\
> -	    op11;						\
> -	    XOR(x1, x2); XOR(x5, x6);				\
> -	    XOR(x9, x10); XOR(x13, x14);			\
> -	    XOR(y1, y2); XOR(y5, y6);				\
> -	    XOR(y9, y10); XOR(y13, y14);			\
> -		op12;						\
> -		ROTATE(x1,7); ROTATE(x5,7);			\
> -		ROTATE(x9,7); ROTATE(x13,7);			\
> -		ROTATE(y1,7); ROTATE(y5,7);			\
> -		ROTATE(y9,7); ROTATE(y13,7);
> -
> -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
> -			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
> -	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
> -			      x8,x9,x10,x11,x12,x13,x14,x15,\
> -			      y0,y1,y2,y3,y4,y5,y6,y7,\
> -			      y8,y9,y10,y11,y12,y13,y14,y15,\
> -			      ,,,,,,,,,,,)
> -
> -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
> -	  vmrhf tmp0, v0, v1;					\
> -	  vmrhf tmp1, v2, v3;					\
> -	  vmrlf tmp2, v0, v1;					\
> -	  vmrlf   v3, v2, v3;					\
> -	  vmrhf tmpa, va, vb;					\
> -	  vmrhf tmpb, vc, vd;					\
> -	  vmrlf tmpc, va, vb;					\
> -	  vmrlf   vd, vc, vd;					\
> -	  vpdi v0, tmp0, tmp1, 0;				\
> -	  vpdi v1, tmp0, tmp1, 5;				\
> -	  vpdi v2, tmp2,   v3, 0;				\
> -	  vpdi v3, tmp2,   v3, 5;				\
> -	  vpdi va, tmpa, tmpb, 0;				\
> -	  vpdi vb, tmpa, tmpb, 5;				\
> -	  vpdi vc, tmpc,   vd, 0;				\
> -	  vpdi vd, tmpc,   vd, 5;
> -
> -.balign 8
> -.globl __chacha20_s390x_vx_blocks8
> -ENTRY (__chacha20_s390x_vx_blocks8)
> -	/* input:
> -	 *	%r2: input
> -	 *	%r3: dst
> -	 *	%r4: src
> -	 *	%r5: nblks (multiple of 8)
> -	 */
> -
> -	START_STACK(%r8);
> -	lgr NBLKS, %r5;
> -
> -	larl %r7, .Lconsts;
> -
> -	/* Load counter. */
> -	lg %r8, (12 * 4)(INPUT);
> -	rllg %r8, %r8, 32;
> -
> -.balign 4
> -	/* Process eight chacha20 blocks per loop. */
> -.Lloop8:
> -	vlm Y0, Y3, 0(INPUT);
> -
> -	slgfi NBLKS, 8;
> -	lghi ROUND, (20 / 2);
> -
> -	/* Construct counter vectors X12/X13 & Y12/Y13. */
> -	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
> -	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
> -	vrepf Y12, Y3, 0;
> -	vrepf Y13, Y3, 1;
> -	vaccf X5, Y12, X4;
> -	vaccf Y5, Y12, Y4;
> -	vaf X12, Y12, X4;
> -	vaf Y12, Y12, Y4;
> -	vaf X13, Y13, X5;
> -	vaf Y13, Y13, Y5;
> -
> -	vrepf X0, Y0, 0;
> -	vrepf X1, Y0, 1;
> -	vrepf X2, Y0, 2;
> -	vrepf X3, Y0, 3;
> -	vrepf X4, Y1, 0;
> -	vrepf X5, Y1, 1;
> -	vrepf X6, Y1, 2;
> -	vrepf X7, Y1, 3;
> -	vrepf X8, Y2, 0;
> -	vrepf X9, Y2, 1;
> -	vrepf X10, Y2, 2;
> -	vrepf X11, Y2, 3;
> -	vrepf X14, Y3, 2;
> -	vrepf X15, Y3, 3;
> -
> -	/* Store counters for blocks 0-7. */
> -	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
> -	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
> -
> -	vlr Y0, X0;
> -	vlr Y1, X1;
> -	vlr Y2, X2;
> -	vlr Y3, X3;
> -	vlr Y4, X4;
> -	vlr Y5, X5;
> -	vlr Y6, X6;
> -	vlr Y7, X7;
> -	vlr Y8, X8;
> -	vlr Y9, X9;
> -	vlr Y10, X10;
> -	vlr Y11, X11;
> -	vlr Y14, X14;
> -	vlr Y15, X15;
> -
> -	/* Update and store counter. */
> -	agfi %r8, 8;
> -	rllg %r5, %r8, 32;
> -	stg %r5, (12 * 4)(INPUT);
> -
> -.balign 4
> -.Lround2_8:
> -	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
> -			 X2, X6, X10, X14,   X3, X7, X11, X15,
> -			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
> -			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
> -	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
> -			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
> -			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
> -			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
> -	brctg ROUND, .Lround2_8;
> -
> -	/* Store blocks 4-7. */
> -	vstm Y0, Y15, STACK_Y0_Y15(%r15);
> -
> -	/* Load counters for blocks 0-3. */
> -	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
> -
> -	lghi ROUND, 1;
> -	j .Lfirst_output_4blks_8;
> -
> -.balign 4
> -.Lsecond_output_4blks_8:
> -	/* Load blocks 4-7. */
> -	vlm X0, X15, STACK_Y0_Y15(%r15);
> -
> -	/* Load counters for blocks 4-7. */
> -	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
> -
> -	lghi ROUND, 0;
> -
> -.balign 4
> -	/* Output four chacha20 blocks per loop. */
> -.Lfirst_output_4blks_8:
> -	vlm Y12, Y15, 0(INPUT);
> -	PLUS(X12, Y0);
> -	PLUS(X13, Y1);
> -	vrepf Y0, Y12, 0;
> -	vrepf Y1, Y12, 1;
> -	vrepf Y2, Y12, 2;
> -	vrepf Y3, Y12, 3;
> -	vrepf Y4, Y13, 0;
> -	vrepf Y5, Y13, 1;
> -	vrepf Y6, Y13, 2;
> -	vrepf Y7, Y13, 3;
> -	vrepf Y8, Y14, 0;
> -	vrepf Y9, Y14, 1;
> -	vrepf Y10, Y14, 2;
> -	vrepf Y11, Y14, 3;
> -	vrepf Y14, Y15, 2;
> -	vrepf Y15, Y15, 3;
> -	PLUS(X0, Y0);
> -	PLUS(X1, Y1);
> -	PLUS(X2, Y2);
> -	PLUS(X3, Y3);
> -	PLUS(X4, Y4);
> -	PLUS(X5, Y5);
> -	PLUS(X6, Y6);
> -	PLUS(X7, Y7);
> -	PLUS(X8, Y8);
> -	PLUS(X9, Y9);
> -	PLUS(X10, Y10);
> -	PLUS(X11, Y11);
> -	PLUS(X14, Y14);
> -	PLUS(X15, Y15);
> -
> -	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
> -	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
> -			Y9, Y10, Y11, Y12, Y13, Y14);
> -	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
> -			Y9, Y10, Y11, Y12, Y13, Y14);
> -
> -	vlm Y0, Y14, 0(SRC);
> -	vperm X0, X0, X0, Y15;
> -	vperm X1, X1, X1, Y15;
> -	vperm X2, X2, X2, Y15;
> -	vperm X3, X3, X3, Y15;
> -	vperm X4, X4, X4, Y15;
> -	vperm X5, X5, X5, Y15;
> -	vperm X6, X6, X6, Y15;
> -	vperm X7, X7, X7, Y15;
> -	vperm X8, X8, X8, Y15;
> -	vperm X9, X9, X9, Y15;
> -	vperm X10, X10, X10, Y15;
> -	vperm X11, X11, X11, Y15;
> -	vperm X12, X12, X12, Y15;
> -	vperm X13, X13, X13, Y15;
> -	vperm X14, X14, X14, Y15;
> -	vperm X15, X15, X15, Y15;
> -	vl Y15, (15 * 16)(SRC);
> -
> -	XOR(Y0, X0);
> -	XOR(Y1, X4);
> -	XOR(Y2, X8);
> -	XOR(Y3, X12);
> -	XOR(Y4, X1);
> -	XOR(Y5, X5);
> -	XOR(Y6, X9);
> -	XOR(Y7, X13);
> -	XOR(Y8, X2);
> -	XOR(Y9, X6);
> -	XOR(Y10, X10);
> -	XOR(Y11, X14);
> -	XOR(Y12, X3);
> -	XOR(Y13, X7);
> -	XOR(Y14, X11);
> -	XOR(Y15, X15);
> -	vstm Y0, Y15, 0(DST);
> -
> -	aghi SRC, 256;
> -	aghi DST, 256;
> -
> -	clgije ROUND, 1, .Lsecond_output_4blks_8;
> -
> -	clgijhe NBLKS, 8, .Lloop8;
> -
> -
> -	END_STACK(%r8);
> -	xgr %r2, %r2;
> -	br %r14;
> -END (__chacha20_s390x_vx_blocks8)
> -
> -#endif /* HAVE_S390_VX_ASM_SUPPORT */

Ok.

> diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
> deleted file mode 100644
> index 0c6abf77e8..0000000000
> --- a/sysdeps/s390/s390-64/chacha20_arch.h
> +++ /dev/null
> @@ -1,45 +0,0 @@
> -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <stdbool.h>
> -#include <ldsodefs.h>
> -#include <sys/auxv.h>
> -
> -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
> -					  const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -#ifdef HAVE_S390_VX_ASM_SUPPORT
> -  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 8");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> -  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
> -    {
> -      __chacha20_s390x_vx_blocks8 (state, dst, src,
> -				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -      return;
> -    }
> -#endif
> -  chacha20_crypt_generic (state, dst, src, bytes);
> -}
> diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
> index 2c58d5ae2f..a263d294b1 100644
> --- a/sysdeps/unix/sysv/linux/not-cancel.h
> +++ b/sysdeps/unix/sysv/linux/not-cancel.h
> @@ -23,6 +23,7 @@
>  #include <sysdep.h>
>  #include <errno.h>
>  #include <unistd.h>
> +#include <sys/poll.h>
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <time.h>
> @@ -70,9 +71,14 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
>  static inline int
>  __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags)
>  {
> -  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
> +  return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags);
>  }
>  
> +static inline int
> +__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds)
> +{
> +  return INLINE_SYSCALL_CALL (ppoll, fds, nfds, NULL, NULL, 0);
> +}
>  
>  /* Uncancelable fcntl.  */
>  __typeof (__fcntl) __fcntl64_nocancel;

Ok, rv32 and arc already redefines __NR_ppoll __NR_ppoll_time64 and we don't really
case about the timeout.

> diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
> index 0326ebb767..c8a9ed2d40 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.c
> +++ b/sysdeps/unix/sysv/linux/tls-internal.c
> @@ -16,7 +16,6 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <stdlib/arc4random.h>
>  #include <string.h>
>  #include <tls-internal.h>
>  
> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void)
>    struct pthread *self = THREAD_SELF;
>    free (self->tls_state.strsignal_buf);
>    free (self->tls_state.strerror_l_buf);
> -
> -  if (self->tls_state.rand_state != NULL)
> -    {
> -      /* Clear any lingering random state prior so if the thread stack is
> -         cached it won't leak any data.  */
> -      explicit_bzero (self->tls_state.rand_state,
> -		      sizeof (*self->tls_state.rand_state));
> -      free (self->tls_state.rand_state);
> -    }
>  }

Ok.

> diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
> index ebc65d896a..2ebe977802 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.h
> +++ b/sysdeps/unix/sysv/linux/tls-internal.h
> @@ -28,7 +28,6 @@ __glibc_tls_internal (void)
>    return &THREAD_SELF->tls_state;
>  }
>  
> -/* Reset the arc4random TCB state on fork.  */
>  extern void __glibc_tls_internal_free (void) attribute_hidden;
>  
>  #endif
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 1178475d75..c19bef2dec 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -5,13 +5,6 @@ ifeq ($(subdir),csu)
>  gen-as-const-headers += link-defines.sym
>  endif
>  
> -ifeq ($(subdir),stdlib)
> -sysdep_routines += \
> -  chacha20-amd64-sse2 \
> -  chacha20-amd64-avx2 \
> -  # sysdep_routines
> -endif
> -
>  ifeq ($(subdir),gmon)
>  sysdep_routines += _mcount
>  # We cannot compile _mcount.S with -pg because that would create

Ok.

> diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S
> deleted file mode 100644
> index aefd1cdbd0..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-avx2.S
> +++ /dev/null
> @@ -1,328 +0,0 @@
> -/* Optimized AVX2 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-amd64-avx2.S  -  AVX2 implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -
> -#ifdef PIC
> -#  define rRIP (%rip)
> -#else
> -#  define rRIP
> -#endif
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST   %rsi
> -#define SRC   %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (32)
> -#define STACK_VEC_X13 (32 + STACK_VEC_X12)
> -#define STACK_TMP     (32 + STACK_VEC_X13)
> -#define STACK_TMP1    (32 + STACK_TMP)
> -
> -#define STACK_MAX     (32 + STACK_TMP1)
> -
> -/* vector registers */
> -#define X0 %ymm0
> -#define X1 %ymm1
> -#define X2 %ymm2
> -#define X3 %ymm3
> -#define X4 %ymm4
> -#define X5 %ymm5
> -#define X6 %ymm6
> -#define X7 %ymm7
> -#define X8 %ymm8
> -#define X9 %ymm9
> -#define X10 %ymm10
> -#define X11 %ymm11
> -#define X12 %ymm12
> -#define X13 %ymm13
> -#define X14 %ymm14
> -#define X15 %ymm15
> -
> -#define X0h %xmm0
> -#define X1h %xmm1
> -#define X2h %xmm2
> -#define X3h %xmm3
> -#define X4h %xmm4
> -#define X5h %xmm5
> -#define X6h %xmm6
> -#define X7h %xmm7
> -#define X8h %xmm8
> -#define X9h %xmm9
> -#define X10h %xmm10
> -#define X11h %xmm11
> -#define X12h %xmm12
> -#define X13h %xmm13
> -#define X14h %xmm14
> -#define X15h %xmm15
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
> -	vpunpckhdq x1, x0, t2; \
> -	vpunpckldq x1, x0, x0; \
> -	\
> -	vpunpckldq x3, x2, t1; \
> -	vpunpckhdq x3, x2, x2; \
> -	\
> -	vpunpckhqdq t1, x0, x1; \
> -	vpunpcklqdq t1, x0, x0; \
> -	\
> -	vpunpckhqdq x2, t2, x3; \
> -	vpunpcklqdq x2, t2, x2;
> -
> -/* 2x2 128-bit matrix transpose */
> -#define transpose_16byte_2x2(x0,x1,t1) \
> -	vmovdqa    x0, t1; \
> -	vperm2i128 $0x20, x1, x0, x0; \
> -	vperm2i128 $0x31, x1, t1, x1;
> -
> -/**********************************************************************
> -  8-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp)	\
> -	vpsrld $(32 - (c)), v1, tmp;	\
> -	vpslld $(c), v1, v1;		\
> -	vpaddb tmp, v1, v1;		\
> -	vpsrld $(32 - (c)), v2, tmp;	\
> -	vpslld $(c), v2, v2;		\
> -	vpaddb tmp, v2, v2;
> -
> -#define ROTATE_SHUF_2(v1,v2,shuf)	\
> -	vpshufb shuf, v1, v1;		\
> -	vpshufb shuf, v2, v2;
> -
> -#define XOR(ds,s) \
> -	vpxor s, ds, ds;
> -
> -#define PLUS(ds,s) \
> -	vpaddd s, ds, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
> -		      interleave_op1,interleave_op2,\
> -		      interleave_op3,interleave_op4)		\
> -	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
> -		interleave_op1;					\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE_SHUF_2(d1, d2, tmp1);			\
> -		interleave_op2;					\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2, 12, tmp1);				\
> -	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
> -		interleave_op3;					\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE_SHUF_2(d1, d2, tmp1);			\
> -		interleave_op4;					\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2,  7, tmp1);
> -
> -	.section .text.avx2, "ax", @progbits
> -	.align 32
> -chacha20_data:
> -L(shuf_rol16):
> -	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
> -L(shuf_rol8):
> -	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
> -L(inc_counter):
> -	.byte 0,1,2,3,4,5,6,7
> -L(unsigned_cmp):
> -	.long 0x80000000
> -
> -	.hidden __chacha20_avx2_blocks8
> -ENTRY (__chacha20_avx2_blocks8)
> -	/* input:
> -	 *	%rdi: input
> -	 *	%rsi: dst
> -	 *	%rdx: src
> -	 *	%rcx: nblks (multiple of 8)
> -	 */
> -	vzeroupper;
> -
> -	pushq %rbp;
> -	cfi_adjust_cfa_offset(8);
> -	cfi_rel_offset(rbp, 0)
> -	movq %rsp, %rbp;
> -	cfi_def_cfa_register(rbp);
> -
> -	subq $STACK_MAX, %rsp;
> -	andq $~31, %rsp;
> -
> -L(loop8):
> -	mov $20, ROUND;
> -
> -	/* Construct counter vectors X12 and X13 */
> -	vpmovzxbd L(inc_counter) rRIP, X0;
> -	vpbroadcastd L(unsigned_cmp) rRIP, X2;
> -	vpbroadcastd (12 * 4)(INPUT), X12;
> -	vpbroadcastd (13 * 4)(INPUT), X13;
> -	vpaddd X0, X12, X12;
> -	vpxor X2, X0, X0;
> -	vpxor X2, X12, X1;
> -	vpcmpgtd X1, X0, X0;
> -	vpsubd X0, X13, X13;
> -	vmovdqa X12, (STACK_VEC_X12)(%rsp);
> -	vmovdqa X13, (STACK_VEC_X13)(%rsp);
> -
> -	/* Load vectors */
> -	vpbroadcastd (0 * 4)(INPUT), X0;
> -	vpbroadcastd (1 * 4)(INPUT), X1;
> -	vpbroadcastd (2 * 4)(INPUT), X2;
> -	vpbroadcastd (3 * 4)(INPUT), X3;
> -	vpbroadcastd (4 * 4)(INPUT), X4;
> -	vpbroadcastd (5 * 4)(INPUT), X5;
> -	vpbroadcastd (6 * 4)(INPUT), X6;
> -	vpbroadcastd (7 * 4)(INPUT), X7;
> -	vpbroadcastd (8 * 4)(INPUT), X8;
> -	vpbroadcastd (9 * 4)(INPUT), X9;
> -	vpbroadcastd (10 * 4)(INPUT), X10;
> -	vpbroadcastd (11 * 4)(INPUT), X11;
> -	vpbroadcastd (14 * 4)(INPUT), X14;
> -	vpbroadcastd (15 * 4)(INPUT), X15;
> -	vmovdqa X15, (STACK_TMP)(%rsp);
> -
> -L(round2):
> -	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
> -	vmovdqa (STACK_TMP)(%rsp), X15;
> -	vmovdqa X8, (STACK_TMP)(%rsp);
> -	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
> -	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
> -	vmovdqa (STACK_TMP)(%rsp), X8;
> -	vmovdqa X15, (STACK_TMP)(%rsp);
> -	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
> -	sub $2, ROUND;
> -	jnz L(round2);
> -
> -	vmovdqa X8, (STACK_TMP1)(%rsp);
> -
> -	/* tmp := X15 */
> -	vpbroadcastd (0 * 4)(INPUT), X15;
> -	PLUS(X0, X15);
> -	vpbroadcastd (1 * 4)(INPUT), X15;
> -	PLUS(X1, X15);
> -	vpbroadcastd (2 * 4)(INPUT), X15;
> -	PLUS(X2, X15);
> -	vpbroadcastd (3 * 4)(INPUT), X15;
> -	PLUS(X3, X15);
> -	vpbroadcastd (4 * 4)(INPUT), X15;
> -	PLUS(X4, X15);
> -	vpbroadcastd (5 * 4)(INPUT), X15;
> -	PLUS(X5, X15);
> -	vpbroadcastd (6 * 4)(INPUT), X15;
> -	PLUS(X6, X15);
> -	vpbroadcastd (7 * 4)(INPUT), X15;
> -	PLUS(X7, X15);
> -	transpose_4x4(X0, X1, X2, X3, X8, X15);
> -	transpose_4x4(X4, X5, X6, X7, X8, X15);
> -	vmovdqa (STACK_TMP1)(%rsp), X8;
> -	transpose_16byte_2x2(X0, X4, X15);
> -	transpose_16byte_2x2(X1, X5, X15);
> -	transpose_16byte_2x2(X2, X6, X15);
> -	transpose_16byte_2x2(X3, X7, X15);
> -	vmovdqa (STACK_TMP)(%rsp), X15;
> -	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
> -	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
> -	vpbroadcastd (8 * 4)(INPUT), X0;
> -	PLUS(X8, X0);
> -	vpbroadcastd (9 * 4)(INPUT), X0;
> -	PLUS(X9, X0);
> -	vpbroadcastd (10 * 4)(INPUT), X0;
> -	PLUS(X10, X0);
> -	vpbroadcastd (11 * 4)(INPUT), X0;
> -	PLUS(X11, X0);
> -	vmovdqa (STACK_VEC_X12)(%rsp), X0;
> -	PLUS(X12, X0);
> -	vmovdqa (STACK_VEC_X13)(%rsp), X0;
> -	PLUS(X13, X0);
> -	vpbroadcastd (14 * 4)(INPUT), X0;
> -	PLUS(X14, X0);
> -	vpbroadcastd (15 * 4)(INPUT), X0;
> -	PLUS(X15, X0);
> -	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
> -	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
> -
> -	/* Update counter */
> -	addq $8, (12 * 4)(INPUT);
> -
> -	transpose_4x4(X8, X9, X10, X11, X0, X1);
> -	transpose_4x4(X12, X13, X14, X15, X0, X1);
> -	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
> -	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
> -	transpose_16byte_2x2(X8, X12, X0);
> -	transpose_16byte_2x2(X9, X13, X0);
> -	transpose_16byte_2x2(X10, X14, X0);
> -	transpose_16byte_2x2(X11, X15, X0);
> -	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
> -	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
> -	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
> -	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
> -	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
> -	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
> -	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
> -	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
> -	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
> -	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
> -
> -	sub $8, NBLKS;
> -	lea (8 * 64)(DST), DST;
> -	lea (8 * 64)(SRC), SRC;
> -	jnz L(loop8);
> -
> -	vzeroupper;
> -
> -	/* eax zeroed by round loop. */
> -	leave;
> -	cfi_adjust_cfa_offset(-8)
> -	cfi_def_cfa_register(%rsp);
> -	ret;
> -	int3;
> -END(__chacha20_avx2_blocks8)
> diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S
> deleted file mode 100644
> index 351a1109c6..0000000000
> --- a/sysdeps/x86_64/chacha20-amd64-sse2.S
> +++ /dev/null
> @@ -1,311 +0,0 @@
> -/* Optimized SSE2 implementation of ChaCha20 cipher.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -/* chacha20-amd64-ssse3.S  -  SSSE3 implementation of ChaCha20 cipher
> -
> -   Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> -
> -   This file is part of Libgcrypt.
> -
> -   Libgcrypt is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of
> -   the License, or (at your option) any later version.
> -
> -   Libgcrypt is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with this program; if not, see <https://www.gnu.org/licenses/>.
> -*/
> -
> -/* Based on D. J. Bernstein reference implementation at
> -   http://cr.yp.to/chacha.html:
> -
> -   chacha-regs.c version 20080118
> -   D. J. Bernstein
> -   Public domain.  */
> -
> -#include <sysdep.h>
> -#include <isa-level.h>
> -
> -#if MINIMUM_X86_ISA_LEVEL <= 2
> -
> -#ifdef PIC
> -#  define rRIP (%rip)
> -#else
> -#  define rRIP
> -#endif
> -
> -/* 'ret' instruction replacement for straight-line speculation mitigation */
> -#define ret_spec_stop \
> -        ret; int3;
> -
> -/* register macros */
> -#define INPUT %rdi
> -#define DST   %rsi
> -#define SRC   %rdx
> -#define NBLKS %rcx
> -#define ROUND %eax
> -
> -/* stack structure */
> -#define STACK_VEC_X12 (16)
> -#define STACK_VEC_X13 (16 + STACK_VEC_X12)
> -#define STACK_TMP     (16 + STACK_VEC_X13)
> -#define STACK_TMP1    (16 + STACK_TMP)
> -#define STACK_TMP2    (16 + STACK_TMP1)
> -
> -#define STACK_MAX     (16 + STACK_TMP2)
> -
> -/* vector registers */
> -#define X0 %xmm0
> -#define X1 %xmm1
> -#define X2 %xmm2
> -#define X3 %xmm3
> -#define X4 %xmm4
> -#define X5 %xmm5
> -#define X6 %xmm6
> -#define X7 %xmm7
> -#define X8 %xmm8
> -#define X9 %xmm9
> -#define X10 %xmm10
> -#define X11 %xmm11
> -#define X12 %xmm12
> -#define X13 %xmm13
> -#define X14 %xmm14
> -#define X15 %xmm15
> -
> -/**********************************************************************
> -  helper macros
> - **********************************************************************/
> -
> -/* 4x4 32-bit integer matrix transpose */
> -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
> -	movdqa    x0, t2; \
> -	punpckhdq x1, t2; \
> -	punpckldq x1, x0; \
> -	\
> -	movdqa    x2, t1; \
> -	punpckldq x3, t1; \
> -	punpckhdq x3, x2; \
> -	\
> -	movdqa     x0, x1; \
> -	punpckhqdq t1, x1; \
> -	punpcklqdq t1, x0; \
> -	\
> -	movdqa     t2, x3; \
> -	punpckhqdq x2, x3; \
> -	punpcklqdq x2, t2; \
> -	movdqa     t2, x2;
> -
> -/* fill xmm register with 32-bit value from memory */
> -#define PBROADCASTD(mem32, xreg) \
> -	movd mem32, xreg; \
> -	pshufd $0, xreg, xreg;
> -
> -/**********************************************************************
> -  4-way chacha20
> - **********************************************************************/
> -
> -#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
> -	movdqa v1, tmp1; 		\
> -	movdqa v2, tmp2; 		\
> -	psrld $(32 - (c)), v1;		\
> -	pslld $(c), tmp1;		\
> -	paddb tmp1, v1;			\
> -	psrld $(32 - (c)), v2;		\
> -	pslld $(c), tmp2;		\
> -	paddb tmp2, v2;
> -
> -#define XOR(ds,s) \
> -	pxor s, ds;
> -
> -#define PLUS(ds,s) \
> -	paddd s, ds;
> -
> -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
> -	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
> -	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
> -	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
> -	    ROTATE2(b1, b2,  7, tmp1, tmp2);
> -
> -	.section .text.sse2,"ax",@progbits
> -
> -chacha20_data:
> -	.align 16
> -L(counter1):
> -	.long 1,0,0,0
> -L(inc_counter):
> -	.long 0,1,2,3
> -L(unsigned_cmp):
> -	.long 0x80000000,0x80000000,0x80000000,0x80000000
> -
> -	.hidden __chacha20_sse2_blocks4
> -ENTRY (__chacha20_sse2_blocks4)
> -	/* input:
> -	 *	%rdi: input
> -	 *	%rsi: dst
> -	 *	%rdx: src
> -	 *	%rcx: nblks (multiple of 4)
> -	 */
> -
> -	pushq %rbp;
> -	cfi_adjust_cfa_offset(8);
> -	cfi_rel_offset(rbp, 0)
> -	movq %rsp, %rbp;
> -	cfi_def_cfa_register(%rbp);
> -
> -	subq $STACK_MAX, %rsp;
> -	andq $~15, %rsp;
> -
> -L(loop4):
> -	mov $20, ROUND;
> -
> -	/* Construct counter vectors X12 and X13 */
> -	movdqa L(inc_counter) rRIP, X0;
> -	movdqa L(unsigned_cmp) rRIP, X2;
> -	PBROADCASTD((12 * 4)(INPUT), X12);
> -	PBROADCASTD((13 * 4)(INPUT), X13);
> -	paddd X0, X12;
> -	movdqa X12, X1;
> -	pxor X2, X0;
> -	pxor X2, X1;
> -	pcmpgtd X1, X0;
> -	psubd X0, X13;
> -	movdqa X12, (STACK_VEC_X12)(%rsp);
> -	movdqa X13, (STACK_VEC_X13)(%rsp);
> -
> -	/* Load vectors */
> -	PBROADCASTD((0 * 4)(INPUT), X0);
> -	PBROADCASTD((1 * 4)(INPUT), X1);
> -	PBROADCASTD((2 * 4)(INPUT), X2);
> -	PBROADCASTD((3 * 4)(INPUT), X3);
> -	PBROADCASTD((4 * 4)(INPUT), X4);
> -	PBROADCASTD((5 * 4)(INPUT), X5);
> -	PBROADCASTD((6 * 4)(INPUT), X6);
> -	PBROADCASTD((7 * 4)(INPUT), X7);
> -	PBROADCASTD((8 * 4)(INPUT), X8);
> -	PBROADCASTD((9 * 4)(INPUT), X9);
> -	PBROADCASTD((10 * 4)(INPUT), X10);
> -	PBROADCASTD((11 * 4)(INPUT), X11);
> -	PBROADCASTD((14 * 4)(INPUT), X14);
> -	PBROADCASTD((15 * 4)(INPUT), X15);
> -	movdqa X11, (STACK_TMP)(%rsp);
> -	movdqa X15, (STACK_TMP1)(%rsp);
> -
> -L(round2_4):
> -	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
> -	movdqa (STACK_TMP)(%rsp), X11;
> -	movdqa (STACK_TMP1)(%rsp), X15;
> -	movdqa X8, (STACK_TMP)(%rsp);
> -	movdqa X9, (STACK_TMP1)(%rsp);
> -	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
> -	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
> -	movdqa (STACK_TMP)(%rsp), X8;
> -	movdqa (STACK_TMP1)(%rsp), X9;
> -	movdqa X11, (STACK_TMP)(%rsp);
> -	movdqa X15, (STACK_TMP1)(%rsp);
> -	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
> -	sub $2, ROUND;
> -	jnz L(round2_4);
> -
> -	/* tmp := X15 */
> -	movdqa (STACK_TMP)(%rsp), X11;
> -	PBROADCASTD((0 * 4)(INPUT), X15);
> -	PLUS(X0, X15);
> -	PBROADCASTD((1 * 4)(INPUT), X15);
> -	PLUS(X1, X15);
> -	PBROADCASTD((2 * 4)(INPUT), X15);
> -	PLUS(X2, X15);
> -	PBROADCASTD((3 * 4)(INPUT), X15);
> -	PLUS(X3, X15);
> -	PBROADCASTD((4 * 4)(INPUT), X15);
> -	PLUS(X4, X15);
> -	PBROADCASTD((5 * 4)(INPUT), X15);
> -	PLUS(X5, X15);
> -	PBROADCASTD((6 * 4)(INPUT), X15);
> -	PLUS(X6, X15);
> -	PBROADCASTD((7 * 4)(INPUT), X15);
> -	PLUS(X7, X15);
> -	PBROADCASTD((8 * 4)(INPUT), X15);
> -	PLUS(X8, X15);
> -	PBROADCASTD((9 * 4)(INPUT), X15);
> -	PLUS(X9, X15);
> -	PBROADCASTD((10 * 4)(INPUT), X15);
> -	PLUS(X10, X15);
> -	PBROADCASTD((11 * 4)(INPUT), X15);
> -	PLUS(X11, X15);
> -	movdqa (STACK_VEC_X12)(%rsp), X15;
> -	PLUS(X12, X15);
> -	movdqa (STACK_VEC_X13)(%rsp), X15;
> -	PLUS(X13, X15);
> -	movdqa X13, (STACK_TMP)(%rsp);
> -	PBROADCASTD((14 * 4)(INPUT), X15);
> -	PLUS(X14, X15);
> -	movdqa (STACK_TMP1)(%rsp), X15;
> -	movdqa X14, (STACK_TMP1)(%rsp);
> -	PBROADCASTD((15 * 4)(INPUT), X13);
> -	PLUS(X15, X13);
> -	movdqa X15, (STACK_TMP2)(%rsp);
> -
> -	/* Update counter */
> -	addq $4, (12 * 4)(INPUT);
> -
> -	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
> -	movdqu X0, (64 * 0 + 16 * 0)(DST)
> -	movdqu X1, (64 * 1 + 16 * 0)(DST)
> -	movdqu X2, (64 * 2 + 16 * 0)(DST)
> -	movdqu X3, (64 * 3 + 16 * 0)(DST)
> -	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
> -	movdqa (STACK_TMP)(%rsp), X13;
> -	movdqa (STACK_TMP1)(%rsp), X14;
> -	movdqa (STACK_TMP2)(%rsp), X15;
> -	movdqu X4, (64 * 0 + 16 * 1)(DST)
> -	movdqu X5, (64 * 1 + 16 * 1)(DST)
> -	movdqu X6, (64 * 2 + 16 * 1)(DST)
> -	movdqu X7, (64 * 3 + 16 * 1)(DST)
> -	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
> -	movdqu X8,  (64 * 0 + 16 * 2)(DST)
> -	movdqu X9,  (64 * 1 + 16 * 2)(DST)
> -	movdqu X10, (64 * 2 + 16 * 2)(DST)
> -	movdqu X11, (64 * 3 + 16 * 2)(DST)
> -	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
> -	movdqu X12, (64 * 0 + 16 * 3)(DST)
> -	movdqu X13, (64 * 1 + 16 * 3)(DST)
> -	movdqu X14, (64 * 2 + 16 * 3)(DST)
> -	movdqu X15, (64 * 3 + 16 * 3)(DST)
> -
> -	sub $4, NBLKS;
> -	lea (4 * 64)(DST), DST;
> -	lea (4 * 64)(SRC), SRC;
> -	jnz L(loop4);
> -
> -	/* eax zeroed by round loop. */
> -	leave;
> -	cfi_adjust_cfa_offset(-8)
> -	cfi_def_cfa_register(%rsp);
> -	ret_spec_stop;
> -END (__chacha20_sse2_blocks4)
> -
> -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */

Ok.

> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> deleted file mode 100644
> index 6f3784e392..0000000000
> --- a/sysdeps/x86_64/chacha20_arch.h
> +++ /dev/null
> @@ -1,55 +0,0 @@
> -/* Chacha20 implementation, used on arc4random.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <isa-level.h>
> -#include <ldsodefs.h>
> -#include <cpu-features.h>
> -#include <sys/param.h>
> -
> -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
> -				      const uint8_t *src, size_t nblks)
> -     attribute_hidden;
> -
> -static inline void
> -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> -		size_t bytes)
> -{
> -  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
> -		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
> -  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
> -		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
> -
> -#if MINIMUM_X86_ISA_LEVEL > 2
> -  __chacha20_avx2_blocks8 (state, dst, src,
> -			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
> -  const struct cpu_features* cpu_features = __get_cpu_features ();
> -
> -  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
> -  if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> -      && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !))
> -    __chacha20_avx2_blocks8 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -  else
> -    __chacha20_sse2_blocks4 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#endif
> -}

Ok.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6] arc4random: simplify design for better safety
  2022-07-26 20:17           ` Adhemerval Zanella Netto
@ 2022-07-26 20:56             ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 81+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-26 20:56 UTC (permalink / raw)
  To: Jason A. Donenfeld, libc-alpha
  Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert,
	Mark Harris, Eric Biggers, linux-crypto



On 26/07/22 17:17, Adhemerval Zanella Netto wrote:
> 
> 
> On 26/07/22 16:58, Jason A. Donenfeld wrote:
>> Rather than buffering 16 MiB of entropy in userspace (by way of
>> chacha20), simply call getrandom() every time.
>>
>> This approach is doubtlessly slower, for now, but trying to prematurely
>> optimize arc4random appears to be leading toward all sorts of nasty
>> properties and gotchas. Instead, this patch takes a much more
>> conservative approach. The interface is added as a basic loop wrapper
>> around getrandom(), and then later, the kernel and libc together can
>> work together on optimizing that.
>>
>> This prevents numerous issues in which userspace is unaware of when it
>> really must throw away its buffer, since we avoid buffering all
>> together. Future improvements may include userspace learning more from
>> the kernel about when to do that, which might make these sorts of
>> chacha20-based optimizations more possible. The current heuristic of 16
>> MiB is meaningless garbage that doesn't correspond to anything the
>> kernel might know about. So for now, let's just do something
>> conservative that we know is correct and won't lead to cryptographic
>> issues for users of this function.
>>
>> This patch might be considered along the lines of, "optimization is the
>> root of all evil," in that the much more complex implementation it
>> replaces moves too fast without considering security implications,
>> whereas the incremental approach done here is a much safer way of going
>> about things. Once this lands, we can take our time in optimizing this
>> properly using new interplay between the kernel and userspace.
>>
>> getrandom(0) is used, since that's the one that ensures the bytes
>> returned are cryptographically secure. But on systems without it, we
>> fallback to using /dev/urandom. This is unfortunate because it means
>> opening a file descriptor, but there's not much of a choice. Secondly,
>> as part of the fallback, in order to get more or less the same
>> properties of getrandom(0), we poll on /dev/random, and if the poll
>> succeeds at least once, then we assume the RNG is initialized. This is a
>> rough approximation, as the ancient "non-blocking pool" initialized
>> after the "blocking pool", not before, and it may not port back to all
>> ancient kernels, though it does to all kernels supported by glibc
>> (≥3.2), so generally it's the best approximation we can do.
>>
>> The motivation for including arc4random, in the first place, is to have
>> source-level compatibility with existing code. That means this patch
>> doesn't attempt to litigate the interface itself. It does, however,
>> choose a conservative approach for implementing it.
> 
> LGTM, I agree this is safe solution for 2.36, we can optimize it later
> if is were the case.
> 
> I will run some tests and push it upstream.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

And I think we will need to tune down stdlib/tst-arc4random-thread internal
parameters because it now takes about 1 minute on my testing machine (which
is somewhat recent processor).  I will send a patch to adjust the maximum
number of threads depending of the configured system CPU (to avoid syscall
contention).

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6] arc4random: simplify design for better safety
  2022-07-26 19:58         ` [PATCH v6] " Jason A. Donenfeld
  2022-07-26 20:17           ` Adhemerval Zanella Netto
@ 2022-07-28 10:29           ` Szabolcs Nagy
  2022-07-28 10:36             ` Szabolcs Nagy
  1 sibling, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2022-07-28 10:29 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: libc-alpha, adhemerval.zanella, Florian Weimer, Eric Biggers,
	linux-crypto

The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote:
> Rather than buffering 16 MiB of entropy in userspace (by way of
> chacha20), simply call getrandom() every time.
> 
> This approach is doubtlessly slower, for now, but trying to prematurely
> optimize arc4random appears to be leading toward all sorts of nasty
> properties and gotchas. Instead, this patch takes a much more
> conservative approach. The interface is added as a basic loop wrapper
> around getrandom(), and then later, the kernel and libc together can
> work together on optimizing that.
> 
> This prevents numerous issues in which userspace is unaware of when it
> really must throw away its buffer, since we avoid buffering all
> together. Future improvements may include userspace learning more from
> the kernel about when to do that, which might make these sorts of
> chacha20-based optimizations more possible. The current heuristic of 16
> MiB is meaningless garbage that doesn't correspond to anything the
> kernel might know about. So for now, let's just do something
> conservative that we know is correct and won't lead to cryptographic
> issues for users of this function.
> 
> This patch might be considered along the lines of, "optimization is the
> root of all evil," in that the much more complex implementation it
> replaces moves too fast without considering security implications,
> whereas the incremental approach done here is a much safer way of going
> about things. Once this lands, we can take our time in optimizing this
> properly using new interplay between the kernel and userspace.
> 
> getrandom(0) is used, since that's the one that ensures the bytes
> returned are cryptographically secure. But on systems without it, we
> fallback to using /dev/urandom. This is unfortunate because it means
> opening a file descriptor, but there's not much of a choice. Secondly,
> as part of the fallback, in order to get more or less the same
> properties of getrandom(0), we poll on /dev/random, and if the poll
> succeeds at least once, then we assume the RNG is initialized. This is a
> rough approximation, as the ancient "non-blocking pool" initialized
> after the "blocking pool", not before, and it may not port back to all
> ancient kernels, though it does to all kernels supported by glibc
> (≥3.2), so generally it's the best approximation we can do.
> 
> The motivation for including arc4random, in the first place, is to have
> source-level compatibility with existing code. That means this patch
> doesn't attempt to litigate the interface itself. It does, however,
> choose a conservative approach for implementing it.
> 
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Mark Harris <mark.hsj@gmail.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: linux-crypto@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

fyi, after this patch i see

FAIL: stdlib/tst-arc4random-thread

with

$ cat stdlib/tst-arc4random-thread.out
info: arc4random: minimum of 1750000 blob results expected
info: arc4random: 1750777 blob results observed
info: arc4random_buf: minimum of 1750000 blob results expected
info: arc4random_buf: 1750000 blob results observed
info: arc4random_uniform: minimum of 1750000 blob results expected
Timed out: killed the child process
Termination time: 2022-07-27T14:41:33.766791947
Last write to standard output: 2022-07-27T14:41:22.522497854

on an arm and aarch64 builder.

running it manually it takes >30s to complete.

> ---
>  LICENSES                                      |  23 -
>  NEWS                                          |   4 +-
>  include/stdlib.h                              |   3 -
>  manual/math.texi                              |  13 +-
>  stdlib/Makefile                               |   2 -
>  stdlib/arc4random.c                           | 196 ++----
>  stdlib/arc4random.h                           |  48 --
>  stdlib/chacha20.c                             | 191 ------
>  stdlib/tst-arc4random-chacha20.c              | 167 -----
>  sysdeps/aarch64/Makefile                      |   4 -
>  sysdeps/aarch64/chacha20-aarch64.S            | 314 ----------
>  sysdeps/aarch64/chacha20_arch.h               |  40 --
>  sysdeps/generic/chacha20_arch.h               |  24 -
>  sysdeps/generic/not-cancel.h                  |   3 +
>  sysdeps/generic/tls-internal-struct.h         |   1 -
>  sysdeps/generic/tls-internal.c                |  10 -
>  sysdeps/mach/hurd/_Fork.c                     |   2 -
>  sysdeps/mach/hurd/not-cancel.h                |   4 +
>  sysdeps/nptl/_Fork.c                          |   2 -
>  .../powerpc/powerpc64/be/multiarch/Makefile   |   4 -
>  .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 -
>  .../powerpc64/be/multiarch/chacha20_arch.h    |  42 --
>  sysdeps/powerpc/powerpc64/power8/Makefile     |   5 -
>  .../powerpc/powerpc64/power8/chacha20-ppc.c   | 256 --------
>  .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 --
>  sysdeps/s390/s390-64/Makefile                 |   6 -
>  sysdeps/s390/s390-64/chacha20-s390x.S         | 573 ------------------
>  sysdeps/s390/s390-64/chacha20_arch.h          |  45 --
>  sysdeps/unix/sysv/linux/not-cancel.h          |   8 +-
>  sysdeps/unix/sysv/linux/tls-internal.c        |  10 -
>  sysdeps/unix/sysv/linux/tls-internal.h        |   1 -
>  sysdeps/x86_64/Makefile                       |   7 -
>  sysdeps/x86_64/chacha20-amd64-avx2.S          | 328 ----------
>  sysdeps/x86_64/chacha20-amd64-sse2.S          | 311 ----------
>  sysdeps/x86_64/chacha20_arch.h                |  55 --
>  35 files changed, 64 insertions(+), 2676 deletions(-)
>  delete mode 100644 stdlib/arc4random.h
>  delete mode 100644 stdlib/chacha20.c
>  delete mode 100644 stdlib/tst-arc4random-chacha20.c
>  delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S
>  delete mode 100644 sysdeps/aarch64/chacha20_arch.h
>  delete mode 100644 sysdeps/generic/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
>  delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
>  delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S
>  delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S
>  delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S
>  delete mode 100644 sysdeps/x86_64/chacha20_arch.h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6] arc4random: simplify design for better safety
  2022-07-28 10:29           ` Szabolcs Nagy
@ 2022-07-28 10:36             ` Szabolcs Nagy
  2022-07-28 11:01               ` Adhemerval Zanella
  0 siblings, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2022-07-28 10:36 UTC (permalink / raw)
  To: Jason A. Donenfeld, Florian Weimer, Eric Biggers, libc-alpha,
	linux-crypto

The 07/28/2022 11:29, Szabolcs Nagy via Libc-alpha wrote:
> The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote:
...
> 
> fyi, after this patch i see
> 
> FAIL: stdlib/tst-arc4random-thread
> 
> with
> 
> $ cat stdlib/tst-arc4random-thread.out
> info: arc4random: minimum of 1750000 blob results expected
> info: arc4random: 1750777 blob results observed
> info: arc4random_buf: minimum of 1750000 blob results expected
> info: arc4random_buf: 1750000 blob results observed
> info: arc4random_uniform: minimum of 1750000 blob results expected
> Timed out: killed the child process
> Termination time: 2022-07-27T14:41:33.766791947
> Last write to standard output: 2022-07-27T14:41:22.522497854
> 
> on an arm and aarch64 builder.
> 
> running it manually it takes >30s to complete.

note that before the patch it was <5s on the same machine.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v6] arc4random: simplify design for better safety
  2022-07-28 10:36             ` Szabolcs Nagy
@ 2022-07-28 11:01               ` Adhemerval Zanella
  0 siblings, 0 replies; 81+ messages in thread
From: Adhemerval Zanella @ 2022-07-28 11:01 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Jason A. Donenfeld, Florian Weimer, Eric Biggers, libc-alpha,
	linux-crypto

On Thu, Jul 28, 2022 at 7:37 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The 07/28/2022 11:29, Szabolcs Nagy via Libc-alpha wrote:
> > The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote:
> ...
> >
> > fyi, after this patch i see
> >
> > FAIL: stdlib/tst-arc4random-thread
> >
> > with
> >
> > $ cat stdlib/tst-arc4random-thread.out
> > info: arc4random: minimum of 1750000 blob results expected
> > info: arc4random: 1750777 blob results observed
> > info: arc4random_buf: minimum of 1750000 blob results expected
> > info: arc4random_buf: 1750000 blob results observed
> > info: arc4random_uniform: minimum of 1750000 blob results expected
> > Timed out: killed the child process
> > Termination time: 2022-07-27T14:41:33.766791947
> > Last write to standard output: 2022-07-27T14:41:22.522497854
> >
> > on an arm and aarch64 builder.
> >
> > running it manually it takes >30s to complete.
>
> note that before the patch it was <5s on the same machine.

Yeap, we need to tune down the internal test parameters [1].

[1] https://patchwork.sourceware.org/project/glibc/patch/20220727131031.2016648-1-adhemerval.zanella@linaro.org/

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2022-07-28 11:01 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <YtwgTySJyky0OcgG@zx2c4.com>
2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld
2022-07-23 17:18   ` Paul Eggert
2022-07-24 23:55     ` Jason A. Donenfeld
2022-07-25 20:31       ` Paul Eggert
2022-07-23 17:39   ` Adhemerval Zanella Netto
2022-07-23 22:54     ` Jason A. Donenfeld
2022-07-25 15:33     ` Rich Felker
2022-07-25 15:59       ` Adhemerval Zanella Netto
2022-07-25 17:41         ` Rich Felker
2022-07-25 16:18       ` Sandy Harris
2022-07-25 16:40       ` Florian Weimer
2022-07-25 16:49         ` Adhemerval Zanella Netto
2022-07-25 16:51         ` Jason A. Donenfeld
2022-07-25 17:44         ` Rich Felker
2022-07-25 18:33           ` Cristian Rodríguez
2022-07-25 18:49             ` Rich Felker
2022-07-27  1:54               ` Theodore Ts'o
2022-07-27  2:16                 ` Rich Felker
2022-07-27  2:45                   ` Theodore Ts'o
2022-07-27 11:34                 ` Adhemerval Zanella Netto
2022-07-27 12:32                   ` Theodore Ts'o
2022-07-27 12:49                     ` Florian Weimer
2022-07-27 20:15                       ` Theodore Ts'o
2022-07-27 21:59                         ` Rich Felker
2022-07-28  0:30                           ` Theodore Ts'o
2022-07-28  0:39                         ` Cristian Rodríguez
2022-07-27 15:39                   ` Rich Felker
2022-07-23 19:04   ` Cristian Rodríguez
2022-07-23 22:59     ` Jason A. Donenfeld
2022-07-24 16:23       ` Cristian Rodríguez
2022-07-24 21:57         ` Jason A. Donenfeld
2022-07-25 10:14     ` Florian Weimer
2022-07-25 10:11   ` Florian Weimer
2022-07-25 11:04     ` Jason A. Donenfeld
2022-07-25 12:39       ` Florian Weimer
2022-07-25 13:43         ` Jason A. Donenfeld
2022-07-25 13:58           ` Cristian Rodríguez
2022-07-25 16:06           ` Rich Felker
2022-07-25 16:43             ` Florian Weimer
2022-07-26 14:27         ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud
2022-07-26 14:35         ` arc4random - are you sure we want these? Yann Droneaud
2022-07-25 13:25       ` Jeffrey Walton
2022-07-25 13:48         ` Jason A. Donenfeld
2022-07-25 14:56     ` Rich Felker
2022-07-25 22:57   ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld
2022-07-25 23:11     ` Jason A. Donenfeld
2022-07-25 23:28     ` [PATCH v2] " Jason A. Donenfeld
2022-07-25 23:59       ` Eric Biggers
2022-07-26 10:26         ` Jason A. Donenfeld
2022-07-26  1:10       ` Mark Harris
2022-07-26 10:41         ` Jason A. Donenfeld
2022-07-26 11:06           ` Florian Weimer
2022-07-26 16:51           ` Mark Harris
2022-07-26 18:42             ` Jason A. Donenfeld
2022-07-26 19:18               ` Adhemerval Zanella Netto
2022-07-26 19:24               ` Jason A. Donenfeld
2022-07-26  9:55       ` Florian Weimer
2022-07-26 11:04         ` Jason A. Donenfeld
2022-07-26 11:07           ` [PATCH v3] " Jason A. Donenfeld
2022-07-26 11:11             ` Jason A. Donenfeld
2022-07-26 11:12           ` [PATCH v2] " Florian Weimer
2022-07-26 11:20             ` Jason A. Donenfeld
2022-07-26 11:35               ` Adhemerval Zanella Netto
2022-07-26 11:33       ` Adhemerval Zanella Netto
2022-07-26 11:54         ` Jason A. Donenfeld
2022-07-26 12:08           ` Jason A. Donenfeld
2022-07-26 12:20           ` Jason A. Donenfeld
2022-07-26 12:34           ` Adhemerval Zanella Netto
2022-07-26 12:47             ` Jason A. Donenfeld
2022-07-26 13:11               ` Adhemerval Zanella Netto
2022-07-26 13:30     ` [PATCH v4] " Jason A. Donenfeld
2022-07-26 15:21       ` Yann Droneaud
2022-07-26 16:20       ` Adhemerval Zanella Netto
2022-07-26 18:36         ` Jason A. Donenfeld
2022-07-26 19:08       ` [PATCH v5] " Jason A. Donenfeld
2022-07-26 19:58         ` [PATCH v6] " Jason A. Donenfeld
2022-07-26 20:17           ` Adhemerval Zanella Netto
2022-07-26 20:56             ` Adhemerval Zanella Netto
2022-07-28 10:29           ` Szabolcs Nagy
2022-07-28 10:36             ` Szabolcs Nagy
2022-07-28 11:01               ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).