* arc4random - are you sure we want these? [not found] <YtwgTySJyky0OcgG@zx2c4.com> @ 2022-07-23 16:25 ` Jason A. Donenfeld 2022-07-23 17:18 ` Paul Eggert ` (4 more replies) 0 siblings, 5 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-23 16:25 UTC (permalink / raw) To: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Yann Droneaud, jann, Michael [Resending to right address.] Hi glibc developers, I learned about the addition of the arc4random functions in glibc this morning, thanks to Phoronix. I wish somebody would have CC'd me into those discussions before it got committed, but here we are. I really wonder whether this is a good idea, whether this is something that glibc wants, and whether it's a design worth committing to in the long term. Firstly, for what use cases does this actually help? As of recent changes to the Linux kernels -- now backported all the way to 4.9! -- getrandom() and /dev/urandom are extremely fast and operate over per-cpu states locklessly. Sure you avoid a syscall by doing that in userspace, but does it really matter? Who exactly benefits from this? Seen that way, it seems like a lot of complexity for nothing, and complexity that will lead to bugs and various oversights eventually. For example, the kernel reseeds itself when virtual machines fork using an identifier passed to the kernel via ACPI. It also reseeds itself on system resume, both from ordinary S3 sleep but also, more importantly, from hibernation. And in general, being the arbiter of entropy, the kernel is much better poised to determine when it makes sense to reseed. Glibc, on the other hand, can employ some heuristics and make some decisions -- on fork, after 16 MiB, and the like -- but in general these are lacking, compared to the much wider array of information the kernel has. You miss out on this with arc4random, and if that information _is_ to be exported to userspace somehow in the future, it would be awfully nice to design the userspace interface alongside the kernel one. For that reason, past discussion of having some random number generation in userspace libcs has geared toward doing this in the vDSO, somehow, where the kernel can be part and parcel of that effort. Seen from this perspective, going with OpenBSD's older paradigm might be rather limiting. Why not work together, between the kernel and libc, to see if we can come up with something better, before settling on an interface with semantics that are hard to walk back later? As-is, it's hard to recommend that anybody really use these functions. Just keep using getrandom(2), which has mostly favorable semantics. Yes, I get it: it's fun to make a random number generator, and so lots of projects figure out some way to make yet another one somewhere somehow. But the tendency to do so feels like a weird computer tinkerer disease rather something that has ever helped the overall ecosystem. So I'm wondering: who actually needs this, and why? What's the performance requirement like, and why is getrandom(2) insufficient? And is this really the best approach to take? If this is something needed, how would you feel about working together on a vDSO approach instead? Or maybe nobody actually needs this in the first place? And secondly, is there anyway that glibc can *not* do this, or has that ship fully sailed, and I really missed out by not being part of that discussion whenever it was happening? Thanks, Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld @ 2022-07-23 17:18 ` Paul Eggert 2022-07-24 23:55 ` Jason A. Donenfeld 2022-07-23 17:39 ` Adhemerval Zanella Netto ` (3 subsequent siblings) 4 siblings, 1 reply; 81+ messages in thread From: Paul Eggert @ 2022-07-23 17:18 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: libc-alpha On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote: > it's hard to recommend that anybody really use these functions. > Just keep using getrandom(2), which has mostly favorable semantics. Yes, that's what I plan to do in GNU projects like Coreutils and Emacs. Although I don't recommend arc4random, I suppose it was added for source-code compatibility with the BSDs (I wasn't involved in the decision). > is there anyway that glibc can *not* do this, or has that > ship fully sailed It hasn't fully sailed since we haven't done a release. > it's fun to make a random number generator, and so lots > of projects figure out some way to make yet another one somewhere > somehow. That's a bit harsh. Coreutils still has its own random number generator because it needed to be portable to a bunch of platforms and there was no standard. Eventually we'll rip it out but there's no rush. Having written much of that code I can reliably assert that it was not fun. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 17:18 ` Paul Eggert @ 2022-07-24 23:55 ` Jason A. Donenfeld 2022-07-25 20:31 ` Paul Eggert 0 siblings, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-24 23:55 UTC (permalink / raw) To: Paul Eggert; +Cc: libc-alpha, linux-crypto Hi Paul, Sorry I missed your reply earlier. I'm not a subscriber so I missed this as I somehow fell out of the CC. On Sat, Jul 23, 2022 at 05:18:05PM +0000, Paul Eggert wrote: > On 7/23/22 09:25, Jason A. Donenfeld via Libc-alpha wrote: > > it's hard to recommend that anybody really use these functions. > > Just keep using getrandom(2), which has mostly favorable semantics. > > Yes, that's what I plan to do in GNU projects like Coreutils and Emacs. > > Although I don't recommend arc4random, I suppose it was added for > source-code compatibility with the BSDs (I wasn't involved in the decision). Source code compatibility isn't exactly a bad goal. But according to Adhemerval you don't plan on this being a secure thing -- hence mentioning as such in the documentation as he mentioned -- so it seems like a maybe-okay goal gone bad. But, anyway, if the goal is just basic source code compatibility, back it with simple calls to getrandom() to start, and if later there are performance issues (big if!), we can look into vDSO tricks and such to speed that up. There's no need to add a whole new huge fraught mechanism for that. > > is there anyway that glibc can *not* do this, or has that > > ship fully sailed > > It hasn't fully sailed since we haven't done a release. Well that's good. I'd recommend just backing it out until it can be done in a way that glibc developers feel comfortable calling safe (and others too, of course, but at the very least you don't want to start out making something you feel the need to warn about in the documentation). > That's a bit harsh. Coreutils still has its own random number generator > because it needed to be portable to a bunch of platforms and there was > no standard. Eventually we'll rip it out but there's no rush. Having > written much of that code I can reliably assert that it was not fun. I'm happy to help with this if you need. I recently cleaned up some stuff similar sounding in systemd for their uses; random-util.c there might be of interest. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-24 23:55 ` Jason A. Donenfeld @ 2022-07-25 20:31 ` Paul Eggert 0 siblings, 0 replies; 81+ messages in thread From: Paul Eggert @ 2022-07-25 20:31 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: libc-alpha, linux-crypto On 7/24/22 16:55, Jason A. Donenfeld wrote: > Sorry I missed your reply earlier. I'm not a subscriber so I missed this > as I somehow fell out of the CC. Your email provider (Google) rejected email from cs.ucla.edu on the grounds that its IP address 131.179.128.68 has a "very low reputation". Google provided no way to appeal or fix the problem. I am using "Reply All" for this message because Google likely won't deliver it to you directly. Perhaps someone else can forward it to you for me. (Sorry to bother the list.) Perhaps this is a subtle way to encourage our department's faculty to let Google manage our email. We've resisted so far, though. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld 2022-07-23 17:18 ` Paul Eggert @ 2022-07-23 17:39 ` Adhemerval Zanella Netto 2022-07-23 22:54 ` Jason A. Donenfeld 2022-07-25 15:33 ` Rich Felker 2022-07-23 19:04 ` Cristian Rodríguez ` (2 subsequent siblings) 4 siblings, 2 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-23 17:39 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud, jann, Michael, Paul Eggert On 23/07/22 13:25, Jason A. Donenfeld wrote: > [Resending to right address.] > > Hi glibc developers, > > I learned about the addition of the arc4random functions in glibc this > morning, thanks to Phoronix. I wish somebody would have CC'd me into > those discussions before it got committed, but here we are. Florian has sent the initial version about four years ago in on libc alpha (libc-alpha@sourceware.org). This is the maillist used for glibc development, RFC, and general discussions. > > I really wonder whether this is a good idea, whether this is something > that glibc wants, and whether it's a design worth committing to in the > long term. I think so, this is something developers have been asking us since 2007 [1] and used and ported on multiples OS (OpenBSD, FreeBSD, MAcOSX). > > Firstly, for what use cases does this actually help? As of recent > changes to the Linux kernels -- now backported all the way to 4.9! -- > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > states locklessly. Sure you avoid a syscall by doing that in userspace, > but does it really matter? Who exactly benefits from this? Mainly performance, since glibc both export getrandom and getentropy. There were some discussion on maillist and we also decided to explicit state this is not a CSRNG on our documentation. > > Seen that way, it seems like a lot of complexity for nothing, and > complexity that will lead to bugs and various oversights eventually. > > For example, the kernel reseeds itself when virtual machines fork using > an identifier passed to the kernel via ACPI. It also reseeds itself on > system resume, both from ordinary S3 sleep but also, more importantly, > from hibernation. And in general, being the arbiter of entropy, the > kernel is much better poised to determine when it makes sense to reseed. > > Glibc, on the other hand, can employ some heuristics and make some > decisions -- on fork, after 16 MiB, and the like -- but in general these > are lacking, compared to the much wider array of information the kernel > has. > > You miss out on this with arc4random, and if that information _is_ to be > exported to userspace somehow in the future, it would be awfully nice to > design the userspace interface alongside the kernel one. > > For that reason, past discussion of having some random number generation > in userspace libcs has geared toward doing this in the vDSO, somehow, > where the kernel can be part and parcel of that effort. > > Seen from this perspective, going with OpenBSD's older paradigm might be > rather limiting. Why not work together, between the kernel and libc, to > see if we can come up with something better, before settling on an > interface with semantics that are hard to walk back later? Mainly because there are some programs out there that can still benefit from a wide-spread interface instead of relying on a not yet implemented interface that will be only available in a future kernel. But at same time there nothing prevents us to either use the vDSO-like interface or improve our implementation with better heuristics or even use a different cipher algorithm. There are even some discussion on making arc4random fallback to getrandom if a tunable or if kernel is set on some strict manner. > > As-is, it's hard to recommend that anybody really use these functions. > Just keep using getrandom(2), which has mostly favorable semantics. > > Yes, I get it: it's fun to make a random number generator, and so lots > of projects figure out some way to make yet another one somewhere > somehow. But the tendency to do so feels like a weird computer tinkerer > disease rather something that has ever helped the overall ecosystem. I did not added because it was 'fun' not I was trying to be clever here, my initial plan was to use a de-facto implementation based on OpenBSD exactly to avoid the pitfalls on trying to come up a new RNG scheme. > > So I'm wondering: who actually needs this, and why? What's the > performance requirement like, and why is getrandom(2) insufficient? And > is this really the best approach to take? If this is something needed, > how would you feel about working together on a vDSO approach instead? Or > maybe nobody actually needs this in the first place? The vDSO approach would be good think and if even the kernel provides it I think it would feasible to wire-up arc4random to use it if the underlying kernel supports it. The OpenBSD, for instance, has a feature to instruct kernel provide direct random data to ELF segment [4]; and they use it to seed various libc hardening features (way more versatile than AT_RANDOM and more fail proff than getrandom, as we saw on some environment where). > > And secondly, is there anyway that glibc can *not* do this, or has that > ship fully sailed, and I really missed out by not being part of that > discussion whenever it was happening? Well, we are in fact discussing adding arc4random since Florian initial proposal [2], roughly 4 years ago; and the initial bug report asking is from 15 years ago. I still think it is a good addition to provide arc4random for the same reason we are adding proposing using strlcpy [3]: developers still use such interface, being source-code compatibility with the BSDs might help developer to avoid rollout their out implementation (even if some developers do agree that are not the best interface), and focusing on one implementation might improve the general ecosystem. As Paul noted, coreutils has its own RNG, while having a arc4random like interface might free it to so (at least on glibc systems). But in the end I think if we are clear about in on the documentation, and provide alternative when the users are aware of the limitation, I do not think it is bad decision. > > Thanks, > Jason [1] https://sourceware.org/bugzilla/show_bug.cgi?id=4417 [2] https://sourceware.org/pipermail/libc-alpha/2018-March/092081.html [3] https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html [4] https://github.com/openbsd/src/blob/master/libexec/ld.so/SPECS.randomdata ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 17:39 ` Adhemerval Zanella Netto @ 2022-07-23 22:54 ` Jason A. Donenfeld 2022-07-25 15:33 ` Rich Felker 1 sibling, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-23 22:54 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Yann Droneaud, jann, Michael, Paul Eggert, linux-crypto Hi Adhemerval, Thanks for your reply. On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto wrote: > > Firstly, for what use cases does this actually help? As of recent > > changes to the Linux kernels -- now backported all the way to 4.9! -- > > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > > states locklessly. Sure you avoid a syscall by doing that in userspace, > > but does it really matter? Who exactly benefits from this? > > Mainly performance, since glibc both export getrandom and getentropy. Okay so your motivation is performance. But can you tell me what your performance goals actually are? All kernel.org stable kernels from 4.9 and upwards now have really fast per-cpu lockless implementations of getrandom() and /dev/urandom. If your goal is performance, I would be very, very interested to find out a circumstance where this is insufficient. > There were some discussion on maillist and we also decided to explicit > state this is not a CSRNG on our documentation. Okay that's all the more reason why this is a completely garbage endeavor. Sorry for the strong language, but the last thing anybody needs is another PRNG that's "half way" between being good for crypto and not. If it's not good for crypto, people will use it anyway, especially since you're winking at them saying, "oh but actually chacha20 is fine technically so....", and then fast-forward a few years when you realize you can lean on your non-crypto commitment and make things different. Never underestimate the power of a poorly defined function definition. If your goal isn't to make a real CSPRNG, why make this kind of thing at all? And it's especially ridiculous since the OpenBSD arc4random *is* used for crypto. So now you've really muddied the waters. (And naturally the OpenBSD arc4random was done in conjunction with their kernel development, since the same people work on both, which isn't what's happened here.) So your "it's a CSPRNG wink wink but the documentation says not, so actually we're off the hook for doing this well" is a cop-out that will lead to trouble. Going back to my original point: what are the performance requirements that point toward a userspace RNG being required here? If it's not actually necessary, then let's not do this. If it is necessary for some legitimate widespread reason, then let's do this right, and actually make something you're comfortable calling cryptographically secure. And let's get this right from the beginning, so that the new interface doesn't come with all sorts of caveats, "this is safe for glibc ≥ 4.3.2.1 only", or whatever else. Again, I'm not adverse to the general concept. I just haven't seen anything really justifying adding the complexity for it. And then assuming that justification does exist somewhere, this approach doesn't seem to be a particularly well planned one. As soon as you find yourself reaching for the "documentation cop-out", something has gone amiss. > The vDSO approach would be good think and if even the kernel provides it > I think it would feasible to wire-up arc4random to use it if the underlying > kernel supports it. So if you justify the performance requirement, wouldn't it make more sense to just back getrandom() itself with a vDSO call? So that way, kernels with that get bits faster (but by how much, really? c'mon...), and kernels without it have things as normal as possible. If your concern is instances in which getrandom() can fail, I'd like to here what those concerns are so that interface can be fixed and improved. > But in the end I think if we are clear about in on the documentation, > and provide alternative when the users are aware of the limitation, I do > not think it is bad decision. This really strikes me as an almost comically ominous expectation. Design interfaces that don't have dangerous pitfalls. While documentation might somehow technically absolve you of responsibility, it doesn't actually help make the ecosystem safer by providing optimal interfaces that don't have cop outs. Anyway, to reiterate: - Can you show me some concerning performance numbers on the current batch of kernel.org stable kernels, and the use cases for which those numbers are concerning, and how widespread you think those use cases are? - If this really *is* necessary for some reason, can we do it well out of the gate, with good coordination between kernel and userland, instead of half-assing it initially and covering that up with a documentation note? Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 17:39 ` Adhemerval Zanella Netto 2022-07-23 22:54 ` Jason A. Donenfeld @ 2022-07-25 15:33 ` Rich Felker 2022-07-25 15:59 ` Adhemerval Zanella Netto ` (2 more replies) 1 sibling, 3 replies; 81+ messages in thread From: Rich Felker @ 2022-07-25 15:33 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud, jann, Michael, Paul Eggert On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: > On 23/07/22 13:25, Jason A. Donenfeld wrote: > > Firstly, for what use cases does this actually help? As of recent > > changes to the Linux kernels -- now backported all the way to 4.9! -- > > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > > states locklessly. Sure you avoid a syscall by doing that in userspace, > > but does it really matter? Who exactly benefits from this? > > Mainly performance, since glibc both export getrandom and getentropy. > There were some discussion on maillist and we also decided to explicit > state this is not a CSRNG on our documentation. This is an extreme documentation/specification bug that *hurts* portability and security. The core contract of the historical arc4random function is that it *is* a CSPRNG. Having a function by that name that's allowed not to be one means now all software using it has to add detection for the broken glibc variant. If the glibc implementation has flaws that actually make it not a CSPRNG, this absolutely needs to be fixed. Not doing so is irresponsible and will set everyone back a long ways. If this is just a case of trying to be "cautious" about overpromising things, the documentation needs fixed to specify that this is a CSPRNG. I'm particularly worried about the wording "these still use a Pseudo-Random generator and should not be used in cryptographic contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make it not cryptographically safe. The safety depends on the original source of the entropy and the practical irreversibility and other cryptographic properties of the extension function. The fact that this has been stated so poorly in the documentation really has me worried that someone does not understand the issues. I haven't dug into the list mails or actual code to determine to what extent that's the case, but it's really, *really* worrying. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 15:33 ` Rich Felker @ 2022-07-25 15:59 ` Adhemerval Zanella Netto 2022-07-25 17:41 ` Rich Felker 2022-07-25 16:18 ` Sandy Harris 2022-07-25 16:40 ` Florian Weimer 2 siblings, 1 reply; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-25 15:59 UTC (permalink / raw) To: Rich Felker Cc: Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud, jann, Michael, Paul Eggert On 25/07/22 12:33, Rich Felker wrote: > > If this is just a case of trying to be "cautious" about overpromising > things, the documentation needs fixed to specify that this is a > CSPRNG. I'm particularly worried about the wording "these still use a > Pseudo-Random generator and should not be used in cryptographic > contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make > it not cryptographically safe. The safety depends on the original > source of the entropy and the practical irreversibility and other > cryptographic properties of the extension function. The fact that this > has been stated so poorly in the documentation really has me worried > that someone does not understand the issues. I haven't dug into the > list mails or actual code to determine to what extent that's the case, > but it's really, *really* worrying. That's the main drive to avoid calling CSPRNGs, since nor me or Florian is secure enough to certify current scheme can actually follow all the requirements. It does follow OpenBSD strategy of a fast-key-erasure random-number generators, although all strategies of key reseeding are basically heuristics. If I understand Jason argument correctly, unless we have a kernel API which it actually handles the buffer (so it can reseed or clear when it seems fit), there is no point is proving a CSPRNGs in userspace, use getrandom instead. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 15:59 ` Adhemerval Zanella Netto @ 2022-07-25 17:41 ` Rich Felker 0 siblings, 0 replies; 81+ messages in thread From: Rich Felker @ 2022-07-25 17:41 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, jann On Mon, Jul 25, 2022 at 12:59:39PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: > > > On 25/07/22 12:33, Rich Felker wrote: > > > > If this is just a case of trying to be "cautious" about overpromising > > things, the documentation needs fixed to specify that this is a > > CSPRNG. I'm particularly worried about the wording "these still use a > > Pseudo-Random generator and should not be used in cryptographic > > contexts". *All* CSPRNGs are PRNGs. Being pseudo-random does not make > > it not cryptographically safe. The safety depends on the original > > source of the entropy and the practical irreversibility and other > > cryptographic properties of the extension function. The fact that this > > has been stated so poorly in the documentation really has me worried > > that someone does not understand the issues. I haven't dug into the > > list mails or actual code to determine to what extent that's the case, > > but it's really, *really* worrying. > > That's the main drive to avoid calling CSPRNGs, since nor me or Florian > is secure enough to certify current scheme can actually follow all the > requirements. It does follow OpenBSD strategy of a fast-key-erasure > random-number generators, although all strategies of key reseeding are > basically heuristics. I think the core problem here is that, in making an implementation of a widely agreed-upon historical function with an existing working definition of what "cryptographically secure" means of a PRNG, you're instead positing a possibly-different definition of "CS" and saying "it might not be CS by this new definition". This does genuine harm to understanding of an area developers and users already understand very very poorly. The documentation should state that it's cryptographically secure in the sense normally meant for arc4random, which includes not falsely returning with "success" at early boot (no GRND_INSECURE or AT_RANDOM fallback), but that this does not necessarily include any guarantees about what happens in a program with undefined behavior ("hardening" properties) or things like actively trying to prevent you from cloning state (VM freeze/resume stuff, etc.) > If I understand Jason argument correctly, unless we have a kernel API > which it actually handles the buffer (so it can reseed or clear when it > seems fit), there is no point is proving a CSPRNGs in userspace, use > getrandom instead. As for me, I am in favor of having the interface, and would be fine with having it just wrap getentropy as an unlimited-length version thereof. The value is in having a commonly agreed upon API with common guarantees so as not to promote YOLO NIH of critical stuff like safe fallbacks for entropy. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 15:33 ` Rich Felker 2022-07-25 15:59 ` Adhemerval Zanella Netto @ 2022-07-25 16:18 ` Sandy Harris 2022-07-25 16:40 ` Florian Weimer 2 siblings, 0 replies; 81+ messages in thread From: Sandy Harris @ 2022-07-25 16:18 UTC (permalink / raw) To: Rich Felker Cc: Adhemerval Zanella Netto, Jason A. Donenfeld, libc-alpha, Florian Weimer, Yann Droneaud, Jann Horn, Michael, Paul Eggert, Linux Crypto Mailing List Rich Felker <dalias@libc.org> wrote: > This is an extreme documentation/specification bug that *hurts* > portability and security. The core contract of the historical > arc4random function is that it *is* a CSPRNG. Having a function by > that name that's allowed not to be one means now all software using it > has to add detection for the broken glibc variant. > > If the glibc implementation has flaws that actually make it not a > CSPRNG, this absolutely needs to be fixed. Not doing so is > irresponsible and will set everyone back a long ways. Exactly! ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 15:33 ` Rich Felker 2022-07-25 15:59 ` Adhemerval Zanella Netto 2022-07-25 16:18 ` Sandy Harris @ 2022-07-25 16:40 ` Florian Weimer 2022-07-25 16:49 ` Adhemerval Zanella Netto ` (2 more replies) 2 siblings, 3 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-25 16:40 UTC (permalink / raw) To: Rich Felker Cc: Adhemerval Zanella Netto, Jason A. Donenfeld, libc-alpha, Yann Droneaud, jann, Michael, Paul Eggert * Rich Felker: > On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: >> On 23/07/22 13:25, Jason A. Donenfeld wrote: >> > Firstly, for what use cases does this actually help? As of recent >> > changes to the Linux kernels -- now backported all the way to 4.9! -- >> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu >> > states locklessly. Sure you avoid a syscall by doing that in userspace, >> > but does it really matter? Who exactly benefits from this? >> >> Mainly performance, since glibc both export getrandom and getentropy. >> There were some discussion on maillist and we also decided to explicit >> state this is not a CSRNG on our documentation. > > This is an extreme documentation/specification bug that *hurts* > portability and security. The core contract of the historical > arc4random function is that it *is* a CSPRNG. Having a function by > that name that's allowed not to be one means now all software using it > has to add detection for the broken glibc variant. > > If the glibc implementation has flaws that actually make it not a > CSPRNG, this absolutely needs to be fixed. Not doing so is > irresponsible and will set everyone back a long ways. The core issue is that on some kernels/architectures, reading from /dev/urandom can degrade to GRND_INSECURE (approximately), and while the result is likely still unpredictable, not everyone would label that as a CSPRNG. If we document arc4random as a CSPRNG, this means that we would have to ditch the fallback code and abort the process if the getrandom system call is not available: when reading from /dev/urandom as a fallback, we have no way of knowing if we are in any of the impacted execution environments. Based on your other comments, it seems that you are interested in such fallbacks, too, but I don't think you can actually have both (CSPRNG + fallback). And then there is the certification issue. We really want applications that already use OpenSSL for other cryptography to use RAND_bytes instead of arc4random. Likewise for GNUTLS and gnutls_rnd. What should authors of those cryptographic libraries? That's less clear, and really depends on the constraints they operate in (e.g., they may target only a subset of architectures and kernel versions). Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 16:40 ` Florian Weimer @ 2022-07-25 16:49 ` Adhemerval Zanella Netto 2022-07-25 16:51 ` Jason A. Donenfeld 2022-07-25 17:44 ` Rich Felker 2 siblings, 0 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-25 16:49 UTC (permalink / raw) To: Florian Weimer, Rich Felker Cc: Jason A. Donenfeld, libc-alpha, Yann Droneaud, jann, Michael, Paul Eggert On 25/07/22 13:40, Florian Weimer wrote: > * Rich Felker: > >> On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: >>> On 23/07/22 13:25, Jason A. Donenfeld wrote: >>>> Firstly, for what use cases does this actually help? As of recent >>>> changes to the Linux kernels -- now backported all the way to 4.9! -- >>>> getrandom() and /dev/urandom are extremely fast and operate over per-cpu >>>> states locklessly. Sure you avoid a syscall by doing that in userspace, >>>> but does it really matter? Who exactly benefits from this? >>> >>> Mainly performance, since glibc both export getrandom and getentropy. >>> There were some discussion on maillist and we also decided to explicit >>> state this is not a CSRNG on our documentation. >> >> This is an extreme documentation/specification bug that *hurts* >> portability and security. The core contract of the historical >> arc4random function is that it *is* a CSPRNG. Having a function by >> that name that's allowed not to be one means now all software using it >> has to add detection for the broken glibc variant. >> >> If the glibc implementation has flaws that actually make it not a >> CSPRNG, this absolutely needs to be fixed. Not doing so is >> irresponsible and will set everyone back a long ways. > > The core issue is that on some kernels/architectures, reading from > /dev/urandom can degrade to GRND_INSECURE (approximately), and while the > result is likely still unpredictable, not everyone would label that as a > CSPRNG. > > If we document arc4random as a CSPRNG, this means that we would have to > ditch the fallback code and abort the process if the getrandom system > call is not available: when reading from /dev/urandom as a fallback, we > have no way of knowing if we are in any of the impacted execution > environments. Based on your other comments, it seems that you are > interested in such fallbacks, too, but I don't think you can actually > have both (CSPRNG + fallback). It seems the best course of actions, specially form the fact that document arc4random as a CSPRNG seems to a deal-breaker. > > And then there is the certification issue. We really want applications > that already use OpenSSL for other cryptography to use RAND_bytes > instead of arc4random. Likewise for GNUTLS and gnutls_rnd. What should > authors of those cryptographic libraries? That's less clear, and really > depends on the constraints they operate in (e.g., they may target only a > subset of architectures and kernel versions). > > Thanks, > Florian > ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 16:40 ` Florian Weimer 2022-07-25 16:49 ` Adhemerval Zanella Netto @ 2022-07-25 16:51 ` Jason A. Donenfeld 2022-07-25 17:44 ` Rich Felker 2 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 16:51 UTC (permalink / raw) To: Florian Weimer Cc: Rich Felker, Adhemerval Zanella Netto, libc-alpha, Yann Droneaud, jann, Michael, Paul Eggert, linux-crypto Hi Florian, On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer wrote: > The core issue is that on some kernels/architectures, reading from > /dev/urandom can degrade to GRND_INSECURE (approximately), and while the > result is likely still unpredictable, not everyone would label that as a > CSPRNG. On some old kernels (though I think not all?), you can poll on /dev/random. This isn't perfect, as the ancient "non blocking pool" initialized after the "blocking pool", but it's not too imperfect either. Take a look at the previously linked random-util.c. > If we document arc4random as a CSPRNG, this means that we would have to > ditch the fallback code and abort the process if the getrandom system > call is not available: when reading from /dev/urandom as a fallback, we > have no way of knowing if we are in any of the impacted execution > environments. Based on your other comments, it seems that you are > interested in such fallbacks, too, but I don't think you can actually > have both (CSPRNG + fallback). > > And then there is the certification issue. We really want applications > that already use OpenSSL for other cryptography to use RAND_bytes > instead of arc4random. Likewise for GNUTLS and gnutls_rnd. What should > authors of those cryptographic libraries? That's less clear, and really > depends on the constraints they operate in (e.g., they may target only a > subset of architectures and kernel versions). I think all of this is yet another indication that there are some major things to work out -- should we block or not? is buffering safe? is the interface correct? -- and so we should just back out the arc4random commit until this has been explored a bit more. We're not gaining anything from rushing this, especially as a "source code compatibility" thing, if there's not even agreement between OSes on what the function does inside. Jason PS: please try to keep linux-crypto@vger.kernel.org CC'd. I've been bouncing these manually when not, but it's hard to keep up with that. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 16:40 ` Florian Weimer 2022-07-25 16:49 ` Adhemerval Zanella Netto 2022-07-25 16:51 ` Jason A. Donenfeld @ 2022-07-25 17:44 ` Rich Felker 2022-07-25 18:33 ` Cristian Rodríguez 2 siblings, 1 reply; 81+ messages in thread From: Rich Felker @ 2022-07-25 17:44 UTC (permalink / raw) To: Florian Weimer Cc: Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, jann On Mon, Jul 25, 2022 at 06:40:54PM +0200, Florian Weimer via Libc-alpha wrote: > * Rich Felker: > > > On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: > >> On 23/07/22 13:25, Jason A. Donenfeld wrote: > >> > Firstly, for what use cases does this actually help? As of recent > >> > changes to the Linux kernels -- now backported all the way to 4.9! -- > >> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > >> > states locklessly. Sure you avoid a syscall by doing that in userspace, > >> > but does it really matter? Who exactly benefits from this? > >> > >> Mainly performance, since glibc both export getrandom and getentropy. > >> There were some discussion on maillist and we also decided to explicit > >> state this is not a CSRNG on our documentation. > > > > This is an extreme documentation/specification bug that *hurts* > > portability and security. The core contract of the historical > > arc4random function is that it *is* a CSPRNG. Having a function by > > that name that's allowed not to be one means now all software using it > > has to add detection for the broken glibc variant. > > > > If the glibc implementation has flaws that actually make it not a > > CSPRNG, this absolutely needs to be fixed. Not doing so is > > irresponsible and will set everyone back a long ways. > > The core issue is that on some kernels/architectures, reading from > /dev/urandom can degrade to GRND_INSECURE (approximately), and while the > result is likely still unpredictable, not everyone would label that as a > CSPRNG. Then don't fallback to /dev/urandom. It's not even a failsafe fallback anyway (ENFILE, EMFILE, sandboxes, etc.) so it can't safely be used here. Instead use SYS_sysctl and poll for entropy_avail, looping until it's ready. AFAICT this works reliably on all kernels as far back as glibc supports (assuming nothing idiotic like intentionally patching or configuring out random support, but then it's PEBKAC error, as no distros did this). Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 17:44 ` Rich Felker @ 2022-07-25 18:33 ` Cristian Rodríguez 2022-07-25 18:49 ` Rich Felker 0 siblings, 1 reply; 81+ messages in thread From: Cristian Rodríguez @ 2022-07-25 18:33 UTC (permalink / raw) To: Rich Felker Cc: Florian Weimer, Yann Droneaud, jann, Jason A. Donenfeld, libc-alpha, Michael On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <dalias@libc.org> wrote: > Then don't fallback to /dev/urandom. Those are my thoughts as well.. but __libc_fatal() if there is no usable getrandom syscall with the needed semantics, in short making this interface usable only when the kernel is. This is quite drastic, but probably the only sane way to go. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 18:33 ` Cristian Rodríguez @ 2022-07-25 18:49 ` Rich Felker 2022-07-27 1:54 ` Theodore Ts'o 0 siblings, 1 reply; 81+ messages in thread From: Rich Felker @ 2022-07-25 18:49 UTC (permalink / raw) To: Cristian Rodríguez Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, jann On Mon, Jul 25, 2022 at 02:33:05PM -0400, Cristian Rodríguez via Libc-alpha wrote: > On Mon, Jul 25, 2022 at 1:44 PM Rich Felker <dalias@libc.org> wrote: > > > Then don't fallback to /dev/urandom. > > Those are my thoughts as well.. but __libc_fatal() if there is no > usable getrandom syscall with the needed semantics, in short making > this interface usable only when the kernel is. > > This is quite drastic, but probably the only sane way to go. You can at least try the sysctl and possibly also /dev approaches and only treat this as fatal as a last resort. If you can inspect entropy_avail or poll /dev/random to determine that the pool is initialized this is very safe, I think. And some research on distro practices might uncover whether this should be believed to be complete. (Note: I know some folks have raised seccomp sandboxing as an issue too, but unlike kernel which is sometimes locked in by legacy hardware, bad seccomp filters are in principle always fixable and are a form of user/admin error since it's not valid to make assumptions about what syscalls libc needs.) Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 18:49 ` Rich Felker @ 2022-07-27 1:54 ` Theodore Ts'o 2022-07-27 2:16 ` Rich Felker 2022-07-27 11:34 ` Adhemerval Zanella Netto 0 siblings, 2 replies; 81+ messages in thread From: Theodore Ts'o @ 2022-07-27 1:54 UTC (permalink / raw) To: Rich Felker Cc: Cristian Rodríguez, Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, jann, linux-crypto On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote: > > You can at least try the sysctl and possibly also /dev approaches and > only treat this as fatal as a last resort. If you can inspect > entropy_avail or poll /dev/random to determine that the pool is > initialized this is very safe, I think. And some research on distro > practices might uncover whether this should be believed to be > complete. I think people are *way* too worried about what happens if /dev/random is symlinked to /dev/urandom, and/or other bits of insanitry. The getrandom(3) system call has been around since v3.17. That's 2014. Even an ancient, obsolete enterprise distro like RHEL 7 backported the getrandom system call in 2017 --- a full 5 years ago. If someone is still using a pre-2017, or $DEITY help them, pre-2014 kernel, that kernel will be so riddled with zero-day vulnerabilities that some fallback to a /dev/urandom at boot time will be the ***least*** of their worries from a security perspective. And that's assuming someone who is so hide-bound as to be using a badly obsolete kernel would be interested in going to a bleeding edge libc in the first place! Similarly the LTS kernels have gotten backports of Jason's latest enhancements to the /dev/random driver. Someone who is using an out-of-date LTS kernel is similarly likely to be exposed to any number of zero-day vulnerabilities. Hence, the primary path that glibc should be concerned about, IMHO, should assume that getrandom(2) is (a) secure, and (b) fast. The other thing to note here is this really is an over-constrained problem. Some people will insist, strongly, that they need cryptographically secure random numbers, above all else. Others will insist that the interface for getting secure random numbers must never block. Still others will insist that they be able to use the crappiest CPU's, on systems with absolutely no entropy that can be harvested from I/O devices, and that they be able to generate mission- or -life critical cryptgraphic keys milliseconds after the user removes the consumer grade IOT device from the box, and plugs it into wall for the first time. It is ***impossible*** to satisfy all of these constraints. We do the best that we can in the kernel, but it's an order of magnitude harder to do it in userspace. So unless you want to cop-out by saying, "arcrandom isn't really secure, so when 10% of all devices reachable on the internet can breached, don't blame us", I strongly recommend that you leave things to the kernel. - Ted --- "Remember, the 'S' in IOT stands for security." ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 1:54 ` Theodore Ts'o @ 2022-07-27 2:16 ` Rich Felker 2022-07-27 2:45 ` Theodore Ts'o 2022-07-27 11:34 ` Adhemerval Zanella Netto 1 sibling, 1 reply; 81+ messages in thread From: Rich Felker @ 2022-07-27 2:16 UTC (permalink / raw) To: Theodore Ts'o Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On Tue, Jul 26, 2022 at 09:54:30PM -0400, Theodore Ts'o via Libc-alpha wrote: > On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote: > > > > You can at least try the sysctl and possibly also /dev approaches and > > only treat this as fatal as a last resort. If you can inspect > > entropy_avail or poll /dev/random to determine that the pool is > > initialized this is very safe, I think. And some research on distro > > practices might uncover whether this should be believed to be > > complete. > > I think people are *way* too worried about what happens if /dev/random > is symlinked to /dev/urandom, and/or other bits of insanitry. > > The getrandom(3) system call has been around since v3.17. That's > 2014. Last year I helped someone get musl up and running with EABI userspace (all we support) on a pre-EABI kernel (2.6.18 or so?) on embedded hardware in use in the field that could not be upgraded for hardware support reasons. Assuming post-2014 kernel may be okay for desktop/server distros but from my perspective it's pretty unthinkable. > Even an ancient, obsolete enterprise distro like RHEL 7 > backported the getrandom system call in 2017 --- a full 5 years ago. > If someone is still using a pre-2017, or $DEITY help them, pre-2014 > kernel, that kernel will be so riddled with zero-day vulnerabilities > that some fallback to a /dev/urandom at boot time will be the > ***least*** of their worries from a security perspective. And that's > assuming someone who is so hide-bound as to be using a badly obsolete > kernel would be interested in going to a bleeding edge libc in the > first place! There's a huge difference in zero-day vulnerabilities which might exist nowhere but on a box that's not exposed to the outside world, and possibly creating compromised key material from said boxes. And weird embedded stuff that can't be upgraded is *also* the same setting where you have a complete lack of early boot entropy. I'm fine with folks who need this stuff coming to musl instead of glibc, but I think folks on the glibc side are doing right to at least *consider* whether/how it matters rather than writing anything older than a few years off as irrelevant. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 2:16 ` Rich Felker @ 2022-07-27 2:45 ` Theodore Ts'o 0 siblings, 0 replies; 81+ messages in thread From: Theodore Ts'o @ 2022-07-27 2:45 UTC (permalink / raw) To: Rich Felker Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On Tue, Jul 26, 2022 at 10:16:19PM -0400, Rich Felker wrote: > Last year I helped someone get musl up and running with EABI userspace > (all we support) on a pre-EABI kernel (2.6.18 or so?) on embedded > hardware in use in the field that could not be upgraded for hardware > support reasons. Assuming post-2014 kernel may be okay for > desktop/server distros but from my perspective it's pretty > unthinkable. Was that machine on the network in any way? Why did it need cryptographic keys in the first place? Sure, maybe there are some super-rare cases where you just *happen* to decide that you need to use ancient hardware to generate keys that are communicated over the serial console to sign official RPM packages for some distro --- but I would *hope* that the distro could afford to spring for hardware that wasn't antedeluvian. It's fair that there are stupid people out there who think that it's an OK thing to use software which is riddled with zero-days because they're too cheap to update their hardware. But my point is that if you are worrying about fallback to /dev/urandom being a security hole, what *other* security holes might exist on that system? I can understand the argument the machine shouldn't fail, which probably means you probably want to make sure the ancient code shouldn't block forever, even if they are generating RSA public/private keypairs for SSL certificates in their init.d scripts, milliseconds after being booted on CPU's so ancient that they don't support RDRAND. But let's be real here about how secure that system is **actually** going to be, even *if* the random number generator is perfect(tm) and bug-free(tm). Let's not kid ourselves. Cheers, - Ted ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 1:54 ` Theodore Ts'o 2022-07-27 2:16 ` Rich Felker @ 2022-07-27 11:34 ` Adhemerval Zanella Netto 2022-07-27 12:32 ` Theodore Ts'o 2022-07-27 15:39 ` Rich Felker 1 sibling, 2 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-27 11:34 UTC (permalink / raw) To: Theodore Ts'o, Rich Felker Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On 26/07/22 22:54, Theodore Ts'o via Libc-alpha wrote: > On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote: >> >> You can at least try the sysctl and possibly also /dev approaches and >> only treat this as fatal as a last resort. If you can inspect >> entropy_avail or poll /dev/random to determine that the pool is >> initialized this is very safe, I think. And some research on distro >> practices might uncover whether this should be believed to be >> complete. > > I think people are *way* too worried about what happens if /dev/random > is symlinked to /dev/urandom, and/or other bits of insanitry. On glibc, my view is have settled to have the /dev/urandom fallback, mainly to give ancient kernel that we still nominally support a way to call arc4random without aborting the process (which seemed to be a 'featured' frown upon when someone try to standardize posix_random with Austin Group) and to give a fallback if the environment for whatever reason filter getrandom. But to be realistic newer glibc are usually deployed with newer kernels and running on an environment without getrandom support will be highly unlikely. The only scenario that it might happen if someone tries to run some container on older kernel (that one reason that prevented us to raised minimum supported kernel for x86_64 some years ago), but it will most likely have the same issues you described (unless the vendor spent an herculean amount of time on backporting). The only thing I am kinda worried is we will need to be judicious if we aim to use arc4random internally for hardening, since on some pattern usage and kernels we might hit some performance issues. For instance, we will need to tune down some internal parameters for a glibc testing because now on a somewhat recent kernel (5.15.0-41-generic) I am seeing a 10 runtime increase which the change to use getrandom. Jason has told it has been fixed upstream, but taking in consideration the box is an updated Ubuntu 22.04, it might take some time to have this fix propagated on all kernels out there. > > The getrandom(3) system call has been around since v3.17. That's > 2014. Even an ancient, obsolete enterprise distro like RHEL 7 > backported the getrandom system call in 2017 --- a full 5 years ago. > If someone is still using a pre-2017, or $DEITY help them, pre-2014 > kernel, that kernel will be so riddled with zero-day vulnerabilities > that some fallback to a /dev/urandom at boot time will be the > ***least*** of their worries from a security perspective. And that's > assuming someone who is so hide-bound as to be using a badly obsolete > kernel would be interested in going to a bleeding edge libc in the > first place! > > Similarly the LTS kernels have gotten backports of Jason's latest > enhancements to the /dev/random driver. Someone who is using an > out-of-date LTS kernel is similarly likely to be exposed to any number > of zero-day vulnerabilities. Hence, the primary path that glibc > should be concerned about, IMHO, should assume that getrandom(2) is > (a) secure, and (b) fast. > > The other thing to note here is this really is an over-constrained > problem. Some people will insist, strongly, that they need > cryptographically secure random numbers, above all else. Others will > insist that the interface for getting secure random numbers must never > block. Still others will insist that they be able to use the > crappiest CPU's, on systems with absolutely no entropy that can be > harvested from I/O devices, and that they be able to generate mission- > or -life critical cryptgraphic keys milliseconds after the user > removes the consumer grade IOT device from the box, and plugs it into > wall for the first time. > > It is ***impossible*** to satisfy all of these constraints. We do the > best that we can in the kernel, but it's an order of magnitude harder > to do it in userspace. So unless you want to cop-out by saying, > "arcrandom isn't really secure, so when 10% of all devices reachable > on the internet can breached, don't blame us", I strongly recommend > that you leave things to the kernel. > > - Ted > --- > "Remember, the 'S' in IOT stands for security." ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 11:34 ` Adhemerval Zanella Netto @ 2022-07-27 12:32 ` Theodore Ts'o 2022-07-27 12:49 ` Florian Weimer 2022-07-27 15:39 ` Rich Felker 1 sibling, 1 reply; 81+ messages in thread From: Theodore Ts'o @ 2022-07-27 12:32 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Rich Felker, Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On Wed, Jul 27, 2022 at 08:34:17AM -0300, Adhemerval Zanella Netto wrote: > The only thing I am kinda worried is we will need to be judicious if > we aim to use arc4random internally for hardening, since on some pattern > usage and kernels we might hit some performance issues. For instance, > we will need to tune down some internal parameters for a glibc testing > because now on a somewhat recent kernel (5.15.0-41-generic) I am seeing > a 10 runtime increase which the change to use getrandom. Jason has > told it has been fixed upstream, but taking in consideration the box > is an updated Ubuntu 22.04, it might take some time to have this fix > propagated on all kernels out there. What I'd suggest is that we be a realistic about specific use cases. Are we talking about scientific simulations? You don't want to be using be using secure random number generation for that anyway, because most reputable scientists have this thing about repeatable experiments. Are we talking about key generation? How many keys per second is it really realistic that such a system would need to support?. How many SSL connections would it be *able* to support? And since a secure web server or VPN gateway is going to be on the network, then you're going to want the latest kernel fixes, since there have been quite a some Really Bad Security vulnerabilities that have been fixed just in the past week (especially if you care about FEDRAMP or PCI compliance) at which point, you'll get the new and improved getrandom(2). But even if you didn't take the latest kernels, I think you will find that if you actually benchmark how many queries per second a real-life secure web server or VPN gateway, even the original 5.15.0 /dev/random driver was plenty fast enough for real world cryptographic use cases. Sure, maybe numbers would look small on a low-end ARM system --- but how many secure web transactions or IPSEC/wireguard connections could such a low-end ARM system really support, *anyway*? One of the dirty little secrets of web sites who live and die by clickbait performance benchmark articles for advertising revenue is how rarely real life workloads really are bottlenecked by things like, say, file system or /dev/random benchmarks. Reading those articles are *fun*, for people who like to say that their systems' metrics are longer/faster/stronger/whatever, but it's rare that they actually impact real world use cases. More often than not, the bottleneck is elsewhere. Cheers, - Ted P.S. The newer /dev/random drier would probably help out people who do things like "dd if=/dev/urandom of=/dev/expensive-ssd-where-it-would-be-way-faster- and-less-destructive-of-write-wearout-to-use-hdparm-security-erase bs=4k" --- but that's not really relevant to the glibc arc4random() discussion. :-) ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 12:32 ` Theodore Ts'o @ 2022-07-27 12:49 ` Florian Weimer 2022-07-27 20:15 ` Theodore Ts'o 0 siblings, 1 reply; 81+ messages in thread From: Florian Weimer @ 2022-07-27 12:49 UTC (permalink / raw) To: Theodore Ts'o Cc: Adhemerval Zanella Netto, Rich Felker, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann * Theodore Ts'o: > But even if you didn't take the latest kernels, I think you will find > that if you actually benchmark how many queries per second a real-life > secure web server or VPN gateway, even the original 5.15.0 /dev/random > driver was plenty fast enough for real world cryptographic use cases. The idea is to that arc4random() is suitable in pretty much all places that have historically used random() (outside of deterministic simulations). Straight calls to getrandom are much, much slower than random(), and it's not even the system call overhead. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 12:49 ` Florian Weimer @ 2022-07-27 20:15 ` Theodore Ts'o 2022-07-27 21:59 ` Rich Felker 2022-07-28 0:39 ` Cristian Rodríguez 0 siblings, 2 replies; 81+ messages in thread From: Theodore Ts'o @ 2022-07-27 20:15 UTC (permalink / raw) To: Florian Weimer Cc: Adhemerval Zanella Netto, Rich Felker, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote: > * Theodore Ts'o: > > > But even if you didn't take the latest kernels, I think you will find > > that if you actually benchmark how many queries per second a real-life > > secure web server or VPN gateway, even the original 5.15.0 /dev/random > > driver was plenty fast enough for real world cryptographic use cases. > > The idea is to that arc4random() is suitable in pretty much all places > that have historically used random() (outside of deterministic > simulations). Straight calls to getrandom are much, much slower than > random(), and it's not even the system call overhead. What are those places? And what are their performance and security requirements? I've heard some people claim that arc4random() is supposed to provide strong security guarantees. I've heard others claim that it doesn't, or at least glibc was planning on disclaiming security guaranteees. So there seems to be a lack of clarity about the security requirements. What about the performance requirements? Designing an interface where the requirement "as fast as possible" is often not a great pathway to success, because the reality is that engineering is always about tradeoffs. If there are no security requirements (given the claim that some people want to put in the documentation disclaiming that arc4random might not be secure), why not just have people continue to use random(3)? - Ted ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 20:15 ` Theodore Ts'o @ 2022-07-27 21:59 ` Rich Felker 2022-07-28 0:30 ` Theodore Ts'o 2022-07-28 0:39 ` Cristian Rodríguez 1 sibling, 1 reply; 81+ messages in thread From: Rich Felker @ 2022-07-27 21:59 UTC (permalink / raw) To: Theodore Ts'o Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, linux-crypto, jann On Wed, Jul 27, 2022 at 04:15:24PM -0400, Theodore Ts'o via Libc-alpha wrote: > On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote: > > * Theodore Ts'o: > > > > > But even if you didn't take the latest kernels, I think you will find > > > that if you actually benchmark how many queries per second a real-life > > > secure web server or VPN gateway, even the original 5.15.0 /dev/random > > > driver was plenty fast enough for real world cryptographic use cases. > > > > The idea is to that arc4random() is suitable in pretty much all places > > that have historically used random() (outside of deterministic > > simulations). Straight calls to getrandom are much, much slower than > > random(), and it's not even the system call overhead. > > What are those places? And what are their performance and security > requirements? I've heard some people claim that arc4random() is > supposed to provide strong security guarantees. I've heard others > claim that it doesn't, or at least glibc was planning on disclaiming > security guaranteees. So there seems to be a lack of clarity about > the security requirements. The only place I've heard of a viable "soft requirement" for real entropy is for salting the hash function used in hash table maps to harden them against DoS via intentional collisions. This is a small but arguably legitimate usage domain. Most use of random() is not this, and should not be this -- the value of deterministic execution for ability to reproduce crashes, debug, etc. is real, and the value of actual entropy vs a deterministic-seeded prng is imaginary. The purpose of arc4random has always been *cryptographically secure* entropy, not "gratuitously replace random() and break reproducible behavior because the programmer does not understand the difference". Nobody should be advocating for using these functions for anything except secure secrets. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 21:59 ` Rich Felker @ 2022-07-28 0:30 ` Theodore Ts'o 0 siblings, 0 replies; 81+ messages in thread From: Theodore Ts'o @ 2022-07-28 0:30 UTC (permalink / raw) To: Rich Felker Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, Michael, linux-crypto, jann On Wed, Jul 27, 2022 at 05:59:49PM -0400, Rich Felker wrote: > The only place I've heard of a viable "soft requirement" for real > entropy is for salting the hash function used in hash table maps to > harden them against DoS via intentional collisions. This is a small > but arguably legitimate usage domain. OK, so this is an issue that both Perl and Python have had to deal with, as described here: https://lwn.net/Articles/474912/ Is that fair description of the use case which you are describing? Because if it is, in the worst case, we only need a single random value for every http request made to the server. Would you agree with that? I think you'll find that even the original getrandom(2) system call or fetching a random value from /dev/urandom was plenty fast enough for this particular use case. If you're on some slow, ancient CPU, the webserver isn't going to be able to handle that many queries per second. And if you're on a fast CPU, the original /dev/urandom and/or getrandom(2) system call would be plenty fast enough. This is why both Jason and I have been trying to push people to clearly articular a specific use case and the attendant performance requirement, so we can test the hypothesis regarding how critical it is to have an userspace cryptographically secure RNG, with all of the attendant opportunities for security vulnerabilities in the face of VM snapshots, or VM's getting duplicated with a pre-spun execution image, etc., etc. Cheers, - Ted ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 20:15 ` Theodore Ts'o 2022-07-27 21:59 ` Rich Felker @ 2022-07-28 0:39 ` Cristian Rodríguez 1 sibling, 0 replies; 81+ messages in thread From: Cristian Rodríguez @ 2022-07-28 0:39 UTC (permalink / raw) To: Theodore Ts'o Cc: Florian Weimer, Yann Droneaud, Jason A. Donenfeld, Rich Felker, libc-alpha, Michael, linux-crypto, jann On Wed, Jul 27, 2022 at 4:15 PM Theodore Ts'o via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Wed, Jul 27, 2022 at 02:49:57PM +0200, Florian Weimer wrote: > > * Theodore Ts'o: > > > > > But even if you didn't take the latest kernels, I think you will find > > > that if you actually benchmark how many queries per second a real-life > > > secure web server or VPN gateway, even the original 5.15.0 /dev/random > > > driver was plenty fast enough for real world cryptographic use cases. > > > > The idea is to that arc4random() is suitable in pretty much all places > > that have historically used random() (outside of deterministic > > simulations). Straight calls to getrandom are much, much slower than > > random(), and it's not even the system call overhead. > > What are those places? Well pretty much everywhere a shared library is involved from the start.. On one very basic vm here there are 18 shared libraries using srandom, thus perturbing each other states if loaded by the same process, possibly in a catastrophic/predictable way. and nobody uses the random_r interfaces. > And what are their performance and security > requirements? Common programmers know nothing about this, even seasoned ones don't.. if it runs slow or is not CSPRNG then the average app will use one userspace PRNG or CSPRNG or buffer from the kernel somewhere.. I do not have to justify this assertion..it is just a matter you download libgcrypt, gnutls, openssl none of those libraries use the kernel entropy as the first option, all feed them to either proven or dubious s RNGs schemes and then pass that to users. Think on why that is and why we are discussing yet another interface in the first place.. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-27 11:34 ` Adhemerval Zanella Netto 2022-07-27 12:32 ` Theodore Ts'o @ 2022-07-27 15:39 ` Rich Felker 1 sibling, 0 replies; 81+ messages in thread From: Rich Felker @ 2022-07-27 15:39 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Theodore Ts'o, Florian Weimer, Yann Droneaud, Jason A. Donenfeld, libc-alpha, linux-crypto, Michael, jann On Wed, Jul 27, 2022 at 08:34:17AM -0300, Adhemerval Zanella Netto via Libc-alpha wrote: > > > On 26/07/22 22:54, Theodore Ts'o via Libc-alpha wrote: > > On Mon, Jul 25, 2022 at 02:49:30PM -0400, Rich Felker wrote: > >> > >> You can at least try the sysctl and possibly also /dev approaches and > >> only treat this as fatal as a last resort. If you can inspect > >> entropy_avail or poll /dev/random to determine that the pool is > >> initialized this is very safe, I think. And some research on distro > >> practices might uncover whether this should be believed to be > >> complete. > > > > I think people are *way* too worried about what happens if /dev/random > > is symlinked to /dev/urandom, and/or other bits of insanitry. > > On glibc, my view is have settled to have the /dev/urandom fallback, > mainly to give ancient kernel that we still nominally support a way > to call arc4random without aborting the process (which seemed to be > a 'featured' frown upon when someone try to standardize posix_random > with Austin Group) and to give a fallback if the environment for whatever > reason filter getrandom. > > But to be realistic newer glibc are usually deployed with newer kernels > and running on an environment without getrandom support will be highly > unlikely. The only scenario that it might happen if someone tries to > run some container on older kernel (that one reason that prevented us > to raised minimum supported kernel for x86_64 some years ago), but it > will most likely have the same issues you described (unless the vendor > spent an herculean amount of time on backporting). > > The only thing I am kinda worried is we will need to be judicious if > we aim to use arc4random internally for hardening, since on some pattern If failure to support the functionality nukes the process, it's not suitable for internal hardening. Also if it hangs forever in early boot it's not suitable for internal hardening. AT_RANDOM is the functionality for internal hardening, which glibc already uses and should continue to use, extending it with chacha if more bytes are needed. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld 2022-07-23 17:18 ` Paul Eggert 2022-07-23 17:39 ` Adhemerval Zanella Netto @ 2022-07-23 19:04 ` Cristian Rodríguez 2022-07-23 22:59 ` Jason A. Donenfeld 2022-07-25 10:14 ` Florian Weimer 2022-07-25 10:11 ` Florian Weimer 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld 4 siblings, 2 replies; 81+ messages in thread From: Cristian Rodríguez @ 2022-07-23 19:04 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Yann Droneaud, jann, Michael On Sat, Jul 23, 2022 at 12:25 PM Jason A. Donenfeld via Libc-alpha <libc-alpha@sourceware.org> wrote: > For that reason, past discussion of having some random number generation > in userspace libcs has geared toward doing this in the vDSO, somehow, > where the kernel can be part and parcel of that effort. On linux just making this interface call "something" from the VDSO that - does not block. - cannot ever fail or if it does indeed need to bail out it kills the calling thread as last resort. (if neither of those can be provided, we are back to square one) Will be beyond awesome because it could be usable everywhere, including the dynamic linker, malloc or whatever else question is..is there any at least experimental patch with a hope of beign accepted available ? ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 19:04 ` Cristian Rodríguez @ 2022-07-23 22:59 ` Jason A. Donenfeld 2022-07-24 16:23 ` Cristian Rodríguez 2022-07-25 10:14 ` Florian Weimer 1 sibling, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-23 22:59 UTC (permalink / raw) To: Cristian Rodríguez Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Yann Droneaud, jann, Michael, linux-crypto Hi Cristian, On Sat, Jul 23, 2022 at 03:04:36PM -0400, Cristian Rodríguez wrote: > On linux just making this interface call "something" from the VDSO that > > - does not block. > - cannot ever fail or if it does indeed need to bail out it kills the > calling thread as last resort. > > (if neither of those can be provided, we are back to square one) > > Will be beyond awesome because it could be usable everywhere, > including the dynamic linker, malloc or whatever else > question is..is there any at least experimental patch with a hope of > beign accepted available ? Doesn't getrandom() already basically have this quality? If you call getrandom(0), it'll block until the RNG is initialized once (which now happens pretty reliably early on in boot). If you call getrandom(GRND_ INSECURE), it will skip that blocking. Both mechanisms are reliable and available on all current kernel.org stable kernels. Is there something about these you don't like and think need fixing? I'm open to suggestions on how to further improve that interface if it has a notable shortcoming. If somebody has a compelling performance case that's widespread and can't be fixed in the kernel alone, I wouldn't be adverse to vDSOing it. But such an undertaking would probably be contingent on doing this with the glibc developers, rather than trying to retroactively bandaid an addition that shipped broken with a documentation cop-out. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 22:59 ` Jason A. Donenfeld @ 2022-07-24 16:23 ` Cristian Rodríguez 2022-07-24 21:57 ` Jason A. Donenfeld 0 siblings, 1 reply; 81+ messages in thread From: Cristian Rodríguez @ 2022-07-24 16:23 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Yann Droneaud, jann, Michael, linux-crypto On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Doesn't getrandom() already basically have this quality? In current kernels. yes. problems with old kernels remain..The syscall overhead being too high for some use cases is still a remaining problem, if that was overcomed it could be used literally for everything, including simulations and other stuff. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-24 16:23 ` Cristian Rodríguez @ 2022-07-24 21:57 ` Jason A. Donenfeld 0 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-24 21:57 UTC (permalink / raw) To: Cristian Rodríguez Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Yann Droneaud, jann, Michael, linux-crypto Hi Cristian, On Sun, Jul 24, 2022 at 12:23:43PM -0400, Cristian Rodríguez wrote: > On Sat, Jul 23, 2022 at 6:59 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > Doesn't getrandom() already basically have this quality? > > In current kernels. yes. problems with old kernels remain.. Can you outline specifically which kernels you think those are and what the problems you think there are? And how arc4random as currently implemented does away with those problems? I kind of suspect you don't have something specific in mind... > The syscall > overhead being too high for some use cases is still a remaining > problem, Really? Do you have any numbers? I would be very surprised to hear that this is affecting things that intend to use arc4random as a substitute. Could you give me specifics on this? Again, this sounds made up in the absence of something real, widespread, and particular. > if that was overcomed it could be used literally for everything, > including simulations and other stuff. You mentioned simulations, but actually simulations are one thing where you want repeatable randomness -- something insecure with a seed that gives a good distribution and is extremely fast, so that you can repeat your simulation with the same data need-be. For this there are various LFSRs and such that work fine and are well explored. But that's not what getrandom() is, nor arc4random(). More generally speaking, there are well-defined RNGs that are for simulations and take seeds, and there are well-defined RNGs that are sufficient for crypto, and then there's a massive valley of ill-defined junk in between that people keep shooting themselves in the foot with. The fact that you won't even call arc4random cryptographically secure (according to Adhemerval's comment) indicates to me that something has gone wrong here. So, please, I urge you to put the breaks on this a little bit. Come up with numbers. Let's lay out the interfaces and properties we want. And then we'll see what we can draw up together. But now I'm just repeating myself. See my earlier reply here: https://lore.kernel.org/linux-crypto/Ytx8GKSZfRt+ZrEO@zx2c4.com/ Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 19:04 ` Cristian Rodríguez 2022-07-23 22:59 ` Jason A. Donenfeld @ 2022-07-25 10:14 ` Florian Weimer 1 sibling, 0 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-25 10:14 UTC (permalink / raw) To: Cristian Rodríguez Cc: Jason A. Donenfeld, libc-alpha, Adhemerval Zanella Netto, Yann Droneaud, jann, Michael * Cristian Rodríguez: > On Sat, Jul 23, 2022 at 12:25 PM Jason A. Donenfeld via Libc-alpha > <libc-alpha@sourceware.org> wrote: > >> For that reason, past discussion of having some random number generation >> in userspace libcs has geared toward doing this in the vDSO, somehow, >> where the kernel can be part and parcel of that effort. > > On linux just making this interface call "something" from the VDSO that > > - does not block. > - cannot ever fail or if it does indeed need to bail out it kills the > calling thread as last resort. > > (if neither of those can be provided, we are back to square one) > > Will be beyond awesome because it could be usable everywhere, > including the dynamic linker, malloc or whatever else > question is..is there any at least experimental patch with a hope of > beign accepted available ? I agree that this would be nice, but we'd like have to donate thread-specific data for kernel use, and that's currently totally vaporware. The “cannot ever fail” part is impossible to achieve due to old kernels and seccomp filters. Low-level userspace needs to paper over it in some way, so that applications don't have to deal with it. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld ` (2 preceding siblings ...) 2022-07-23 19:04 ` Cristian Rodríguez @ 2022-07-25 10:11 ` Florian Weimer 2022-07-25 11:04 ` Jason A. Donenfeld 2022-07-25 14:56 ` Rich Felker 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld 4 siblings, 2 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-25 10:11 UTC (permalink / raw) To: Jason A. Donenfeld via Libc-alpha Cc: Adhemerval Zanella Netto, Yann Droneaud, jann, Michael, Jason A. Donenfeld * Jason A. Donenfeld via Libc-alpha: > I really wonder whether this is a good idea, whether this is something > that glibc wants, and whether it's a design worth committing to in the > long term. Do you object to the interface, or the implementation? The implementation can be improved easily enough at a later date. > Firstly, for what use cases does this actually help? As of recent > changes to the Linux kernels -- now backported all the way to 4.9! -- > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > states locklessly. Sure you avoid a syscall by doing that in userspace, > but does it really matter? Who exactly benefits from this? getrandom may be fast for bulk generation. It's not that great for generating a few bits here and there. For example, shuffling a 1,000-element array takes 18 microseconds with arc4random_uniform in glibc, and 255 microseconds with the naïve getrandom-based implementation (with slightly biased results; measured on an Intel i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64). > You miss out on this with arc4random, and if that information _is_ to be > exported to userspace somehow in the future, it would be awfully nice to > design the userspace interface alongside the kernel one. What is the kernel interface you are talking about? From an interface standpoint, arc4random_buf and getrandom are very similar, with the main difference is that arc4random_buf cannot report failure (except by terminating the process). > Seen from this perspective, going with OpenBSD's older paradigm might be > rather limiting. Why not work together, between the kernel and libc, to > see if we can come up with something better, before settling on an > interface with semantics that are hard to walk back later? Historically, kernel developers were not interested in solving some of the hard problems (especially early seeding) that prevent the use of getrandom during early userspace stages. > As-is, it's hard to recommend that anybody really use these functions. > Just keep using getrandom(2), which has mostly favorable semantics. Some applications still need to run in configurations where getrandom is not available (either because the kernel is too old, or because it has been disabled via seccomp). > Yes, I get it: it's fun to make a random number generator, and so lots > of projects figure out some way to make yet another one somewhere > somehow. But the tendency to do so feels like a weird computer tinkerer > disease rather something that has ever helped the overall ecosystem. The performance numbers suggest that we benefit from buffering in user space. It might not be necessary to implement expansion in userspace. getrandom (or /dev/urandom) with a moderately-sized buffer could be sufficient. But that's an implementation detail, and something we can revisit later. If we vDSO acceleration for getrandom (maybe using the userspace thread-specific data donation we discussed for rseq), we might eventually do way with the buffering in glibc. Again this is an implementation detail we can change easily enough. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 10:11 ` Florian Weimer @ 2022-07-25 11:04 ` Jason A. Donenfeld 2022-07-25 12:39 ` Florian Weimer 2022-07-25 13:25 ` Jeffrey Walton 2022-07-25 14:56 ` Rich Felker 1 sibling, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 11:04 UTC (permalink / raw) To: Florian Weimer Cc: Jason A. Donenfeld via Libc-alpha, Adhemerval Zanella Netto, Yann Droneaud, jann, Michael, linux-crypto Hi Florian, On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote: > > I really wonder whether this is a good idea, whether this is something > > that glibc wants, and whether it's a design worth committing to in the > > long term. > > Do you object to the interface, or the implementation? > > The implementation can be improved easily enough at a later date. Sort of both, as I don't think it's wise to commit to the former without a good idea of the full ideal space of the latter, and very clearly from reading that discussion, that hasn't been explored. In particular, Adhemerval has said you won't be committing to making arc4random suitable for crypto, going so far as to mention it's not a CSPRNG in the documentation. As I described in my reply to him (please read that), the "documentation cop-out" will lead to tears inevitably. Not only is that dangerous and bad to do alone, but it severely muddies the waters with what other operating systems suggest about its permitted use cases. Here's that email for reference: https://lore.kernel.org/linux-crypto/Ytx8GKSZfRt+ZrEO@zx2c4.com/ If you're going to ship an interface that people *will* use for sensitive things -- especially considering Paul's comment about the intent being "source code compatibility" -- then you must not ship it knowingly broken by design. There's no amount of documentation papering that makes this okay. Until you know how to implement it well, don't ship the interface. And maybe in the process of trying to implement it well, you'll find something suboptimal about the interface that can be fixed. > > Firstly, for what use cases does this actually help? As of recent > > changes to the Linux kernels -- now backported all the way to 4.9! -- > > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > > states locklessly. Sure you avoid a syscall by doing that in userspace, > > but does it really matter? Who exactly benefits from this? > > getrandom may be fast for bulk generation. It's not that great for > generating a few bits here and there. For example, shuffling a > 1,000-element array takes 18 microseconds with arc4random_uniform in > glibc, and 255 microseconds with the naïve getrandom-based > implementation (with slightly biased results; measured on an Intel > i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64). So maybe we should look into vDSO'ing getrandom(), if this is a problem for real use cases, and you find that these sorts of things are widespread in real code? > > You miss out on this with arc4random, and if that information _is_ to be > > exported to userspace somehow in the future, it would be awfully nice to > > design the userspace interface alongside the kernel one. > > What is the kernel interface you are talking about? From an interface > standpoint, arc4random_buf and getrandom are very similar, with the main > difference is that arc4random_buf cannot report failure (except by > terminating the process). Referring to information above about reseeding. So in this case it would be some form of a generation counter most likely. There's also been some discussion about exporting some aspect of the vmgenid counter to userspace. > > Seen from this perspective, going with OpenBSD's older paradigm might be > > rather limiting. Why not work together, between the kernel and libc, to > > see if we can come up with something better, before settling on an > > interface with semantics that are hard to walk back later? > > Historically, kernel developers were not interested in solving some of > the hard problems (especially early seeding) that prevent the use of > getrandom during early userspace stages. I really don't know what you're talking about here. I understood you up until the opening parenthesis, and initially thought to reply, "but I am interested! let's work together" or something, but then you mentioned getrandom()'s issues with early userspace, and I became confused. If you use getrandom(GRND_INSECURE), it won't block and you'll get bytes even before the rng has seeded. If you use getrandom(0), the kernel's RNG will use jitter to seed itself ASAP so it doesn't block forever (on platforms where that's possible, anyhow). Both of these qualities mostly predate my heavy involvement. So your statement confuses me. But with that said, if you do find some lack of interest on something you think is important, please give me a try, and maybe you'll have better luck. I very much am interested in solving longstanding problems in this domain. > > As-is, it's hard to recommend that anybody really use these functions. > > Just keep using getrandom(2), which has mostly favorable semantics. > > Some applications still need to run in configurations where getrandom is > not available (either because the kernel is too old, or because it has > been disabled via seccomp). I don't quite understand this. People without getrandom() typically fallback to using /dev/urandom. "But what if FD in derp derp mountns derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom() came about. But doesn't arc4random() fallback to using /dev/urandom in this exact same way? I don't see how arc4random() really changes the equation here, except that maybe I should amend my statement to say, "Just keep using getrandom(2) or /dev/urandom, which has mostly favorable semantics." (After all, I didn't see any wild-n-crazy fallback to AT_RANDOM like what systemd does with random-util.c: https://github.com/systemd/systemd/blob/main/src/basic/random-util.c ) Seen in that sense, as I wrote to Paul, if you're after arc4random for source code compatibility -- or because you simply like its non-failing interface and want to commit to that no matter the costs whatsoever -- then you could start by making that a light shim around getrandom() (falling back to /dev/urandom, I guess), and then we can look into ways of accelerating getrandom() for new kernels. This way you don't ship something broken out of the gate, and there's still room for improvement. Though I would still note that committing to the interface early like this comes with some concern. > The performance numbers suggest that we benefit from buffering in user > space. The question is whether it's safe and advisable to buffer this way in userspace. Does userspace have the right information now of when to discard the buffer and get a new one? I suspect it does not. > But that's an implementation detail, and something we can revisit later. No, these are not mere implementation details. When Adhemerval is talking about warning people in the documentation that this shouldn't be used for crypto, that should be a wake up call that something is really off here. Don't ship things you know are broken, and then call that an "implementation detail" that can be hedged with "documentation". If a new function, extra_deluxe_memset(), occasionally wrote a 0x41 somewhere unexpected, you'd laugh if somebody called that a mere implementation detail and suggested you just slap a warning in the documentation and call it a day. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 11:04 ` Jason A. Donenfeld @ 2022-07-25 12:39 ` Florian Weimer 2022-07-25 13:43 ` Jason A. Donenfeld ` (2 more replies) 2022-07-25 13:25 ` Jeffrey Walton 1 sibling, 3 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-25 12:39 UTC (permalink / raw) To: Jason A. Donenfeld via Libc-alpha Cc: Jason A. Donenfeld, Yann Droneaud, Michael, linux-crypto, jann * Jason A. Donenfeld via Libc-alpha: > Hi Florian, > > On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote: >> > I really wonder whether this is a good idea, whether this is something >> > that glibc wants, and whether it's a design worth committing to in the >> > long term. >> >> Do you object to the interface, or the implementation? >> >> The implementation can be improved easily enough at a later date. > > Sort of both, as I don't think it's wise to commit to the former without > a good idea of the full ideal space of the latter, and very clearly from > reading that discussion, that hasn't been explored. But we are only concerned with the application interface. Do we really expect that to be different from arc4random_buf and its variants? The interface between glibc and the kernel can be changed without impacting applications. > In particular, Adhemerval has said you won't be committing to making > arc4random suitable for crypto, going so far as to mention it's not a > CSPRNG in the documentation. Below you suggest to use GRND_INSECURE to avoid deadlocks during booting. It's documented in the UAPI header as “Return non-cryptographic random bytes”. I assume it's broadly equivalent to reading from /dev/urandom (which we need to support for backwards compatibility, and currently use to avoid blocking). This means that we cannot really document the resulting bits as cryptographically strong from an application perspective because the kernel is not willing to make this commitment. >> > Firstly, for what use cases does this actually help? As of recent >> > changes to the Linux kernels -- now backported all the way to 4.9! -- >> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu >> > states locklessly. Sure you avoid a syscall by doing that in userspace, >> > but does it really matter? Who exactly benefits from this? >> >> getrandom may be fast for bulk generation. It's not that great for >> generating a few bits here and there. For example, shuffling a >> 1,000-element array takes 18 microseconds with arc4random_uniform in >> glibc, and 255 microseconds with the naïve getrandom-based >> implementation (with slightly biased results; measured on an Intel >> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64). > > So maybe we should look into vDSO'ing getrandom(), if this is a problem > for real use cases, and you find that these sorts of things are > widespread in real code? We can investigate that, but it doesn't change the application interface. >> > You miss out on this with arc4random, and if that information _is_ to be >> > exported to userspace somehow in the future, it would be awfully nice to >> > design the userspace interface alongside the kernel one. >> >> What is the kernel interface you are talking about? From an interface >> standpoint, arc4random_buf and getrandom are very similar, with the main >> difference is that arc4random_buf cannot report failure (except by >> terminating the process). > > Referring to information above about reseeding. So in this case it would > be some form of a generation counter most likely. There's also been some > discussion about exporting some aspect of the vmgenid counter to > userspace. We don't need any of that in userspace if the staging buffer is managed by the kernel, which is why the thread-specific data donation is so attractive as an approach. The kernel knows where all these buffers are located and can invalidate them as needed. >> > Seen from this perspective, going with OpenBSD's older paradigm might be >> > rather limiting. Why not work together, between the kernel and libc, to >> > see if we can come up with something better, before settling on an >> > interface with semantics that are hard to walk back later? >> >> Historically, kernel developers were not interested in solving some of >> the hard problems (especially early seeding) that prevent the use of >> getrandom during early userspace stages. > > I really don't know what you're talking about here. I understood you up > until the opening parenthesis, and initially thought to reply, "but I am > interested! let's work together" or something, but then you mentioned > getrandom()'s issues with early userspace, and I became confused. If you > use getrandom(GRND_INSECURE), it won't block and you'll get bytes even > before the rng has seeded. If you use getrandom(0), the kernel's RNG > will use jitter to seed itself ASAP so it doesn't block forever (on > platforms where that's possible, anyhow). Both of these qualities mostly > predate my heavy involvement. So your statement confuses me. But with > that said, if you do find some lack of interest on something you think > is important, please give me a try, and maybe you'll have better luck. I > very much am interested in solving longstanding problems in this domain. I tried to de-escalate here, and clearly that didn't work. The context here is that historically, working with the “random” kernel maintainers has been very difficult for many groups of people. Many of us are tired of those non-productive discussions. I forgot that this has recently changed on the kernel side. I understand that it's taking years to overcome these perceptions. glibc is still struggling with this, too. Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but as I wrote above, it's UAPI documentation is a bit scary. Maybe it would be possible to clarify this in the manual pages a bit? I *assume* that if we are willing to read from /dev/urandom, we can use GRND_INSECURE right away to avoid that fallback path on sufficiently new kernels. But it would be nice to have confirmation. >> > As-is, it's hard to recommend that anybody really use these functions. >> > Just keep using getrandom(2), which has mostly favorable semantics. >> >> Some applications still need to run in configurations where getrandom is >> not available (either because the kernel is too old, or because it has >> been disabled via seccomp). > > I don't quite understand this. People without getrandom() typically > fallback to using /dev/urandom. "But what if FD in derp derp mountns > derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom() > came about. But doesn't arc4random() fallback to using /dev/urandom in > this exact same way? I don't see how arc4random() really changes the > equation here, except that maybe I should amend my statement to say, > "Just keep using getrandom(2) or /dev/urandom, which has mostly > favorable semantics." (After all, I didn't see any wild-n-crazy fallback > to AT_RANDOM like what systemd does with random-util.c: > https://github.com/systemd/systemd/blob/main/src/basic/random-util.c ) I had some patches with AT_RANDOM fallback, including overwriting AT_RANDOM with output from the seeded PRNG. It's certainly messy. I probably didn't bother to post these patches given how bizarre the whole thing was. I did have fallback to CPU instructions, but that turned out to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware, unclear). > Seen in that sense, as I wrote to Paul, if you're after arc4random for > source code compatibility -- or because you simply like its non-failing > interface and want to commit to that no matter the costs whatsoever -- > then you could start by making that a light shim around getrandom() > (falling back to /dev/urandom, I guess), and then we can look into ways > of accelerating getrandom() for new kernels. This way you don't ship > something broken out of the gate, and there's still room for > improvement. Though I would still note that committing to the interface > early like this comes with some concern. The ChaCha20 generator we currently have in the tree may not be required, true. But this doesn't make what we have today “broken”, it's merely overly complicated. And replacing that with a straight buffer from getrandom does not change the external interface, so we can do this any time we want. >> The performance numbers suggest that we benefit from buffering in user >> space. > > The question is whether it's safe and advisable to buffer this way in > userspace. Does userspace have the right information now of when to > discard the buffer and get a new one? I suspect it does not. Not completely, no, but we can cover many cases. I do not currently see a way around that if we want to promote arc4random_uniform(limit) as a replacement for random() % limit. >> But that's an implementation detail, and something we can revisit later. > > No, these are not mere implementation details. When Adhemerval is > talking about warning people in the documentation that this shouldn't be > used for crypto, that should be a wake up call that something is really > off here. Don't ship things you know are broken, and then call that an > "implementation detail" that can be hedged with "documentation". Again, given the issues around GRND_INSECURE (the reason why it exists), we do not have much choice on the glibc side. And these issues will be there for the foreseeable future, whether glibc provides arc4random or not. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 12:39 ` Florian Weimer @ 2022-07-25 13:43 ` Jason A. Donenfeld 2022-07-25 13:58 ` Cristian Rodríguez 2022-07-25 16:06 ` Rich Felker 2022-07-26 14:27 ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud 2022-07-26 14:35 ` arc4random - are you sure we want these? Yann Droneaud 2 siblings, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 13:43 UTC (permalink / raw) To: Florian Weimer Cc: Jason A. Donenfeld via Libc-alpha, Yann Droneaud, Michael, linux-crypto, jann Hi Florian, On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote: > Below you suggest to use GRND_INSECURE to avoid deadlocks during > booting. It's documented in the UAPI header as “Return > non-cryptographic random bytes”. I assume it's broadly equivalent to > reading from /dev/urandom (which we need to support for backwards > compatibility, and currently use to avoid blocking). This means that we > cannot really document the resulting bits as cryptographically strong > from an application perspective because the kernel is not willing to > make this commitment. > Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but > as I wrote above, it's UAPI documentation is a bit scary. Maybe it > would be possible to clarify this in the manual pages a bit? I *assume* > that if we are willing to read from /dev/urandom, we can use > GRND_INSECURE right away to avoid that fallback path on sufficiently new > kernels. But it would be nice to have confirmation. getrandom(GRND_INSECURE) is the same as getrandom(0), except before the RNG is seeded, in which case the former will return ~garbage randomness while the latter will block. The only current difference between getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a second to do the jitter entropy thing if the RNG isn't seeded yet. I agree that the documentation around this is really bad. Actually, so much of the documentation is out of date or confusing. Thanks for the kick on this: I really do need to rewrite that / clean it up. So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will return the same "quality" randomness as getrandom(0), except before the RNG is initialized. I'll fix up the docs for that, but feel free to refer to this statement ahead of that if you need. Code-wise, the only relevant branch related to GRND_INSECURE is: if (!crng_ready() && !(flags & GRND_INSECURE)) { if (flags & GRND_NONBLOCK) return -EAGAIN; ret = wait_for_random_bytes(); if (unlikely(ret)) return ret; } That means: if it's not ready, and you didn't pass _INSECURE, and you didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out if that's interrupted by a signal. Other than that one block, it continues on to do the same thing as getrandom(0). With that said, however, I think it'd be nice if you used only blocking randomness, and shove the initialization problem at init systems and bootloaders and such. In 5.20, for example, there'll be an x86 boot protocol for GRUB and kexec and hypervisors and such to pass a seed, and since a long time, there exists a device tree attribute for the same. Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for the ecosystem at large. And I'm in general interest in seeing progress on decades long initialization-time seeding concerns. > > Sort of both, as I don't think it's wise to commit to the former without > > a good idea of the full ideal space of the latter, and very clearly from > > reading that discussion, that hasn't been explored. > > But we are only concerned with the application interface. Do we really > expect that to be different from arc4random_buf and its variants? > > The interface between glibc and the kernel can be changed without > impacting applications. I feel like you missed the whole thrust of my argument, in which I caution against shipping something that's known-broken, particularly when it pertains to something sensitive like generating secret keys. Regarding the application interface: it's still unclear what's best until we start trying to see what the implementation would look like. Just to pick something floating around in my head now since reading your last email: there seems to be some question about whether arc4random should block or not. If it's used for crypto, it probably should. But maybe you want an interface that doesn't. Perhaps that discussion leads naturally to exposing a flag. Or not! And then there are related questions about what the return value should be, if any. The point is that the devil is often in the details with these things, and I worry about putting the cart before the horse here. > >> > You miss out on this with arc4random, and if that information _is_ to be > >> > exported to userspace somehow in the future, it would be awfully nice to > >> > design the userspace interface alongside the kernel one. > >> > >> What is the kernel interface you are talking about? From an interface > >> standpoint, arc4random_buf and getrandom are very similar, with the main > >> difference is that arc4random_buf cannot report failure (except by > >> terminating the process). > > > > Referring to information above about reseeding. So in this case it would > > be some form of a generation counter most likely. There's also been some > > discussion about exporting some aspect of the vmgenid counter to > > userspace. > > We don't need any of that in userspace if the staging buffer is managed > by the kernel, which is why the thread-specific data donation is so > attractive as an approach. The kernel knows where all these buffers are > located and can invalidate them as needed. There still might be a need for userspace to have that information, for network protocol implementations that need to drop their ephemeral keys on a virtual machine fork, for example. But that's kind of a different discussion. For the purposes of a vDSO'd getrandom(), I agree that the kernel managing a buffer that's just an opaque blob to userspace is probably the best option. > I tried to de-escalate here, and clearly that didn't work. The context > here is that historically, working with the “random” kernel maintainers > has been very difficult for many groups of people. Many of us are tired > of those non-productive discussions. I forgot that this has recently > changed on the kernel side. I understand that it's taking years to > overcome these perceptions. glibc is still struggling with this, too. Oh, I see what you're getting at. Yea, sure, things are potentially different now. I'm eager to work on this, so if you're finding things that are lacking, I'm all ears for fixing them. > I had some patches with AT_RANDOM fallback, including overwriting > AT_RANDOM with output from the seeded PRNG. It's certainly messy. I > probably didn't bother to post these patches given how bizarre the whole > thing was. I did have fallback to CPU instructions, but that turned out > to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware, > unclear). Yea, it's kind of tricky as other things might be using AT_RANDOM also and then you have a whole race issue and domain separation and whatnot. The thing in systemd isn't really good for crypto -- no forward secrecy and such -- but it's ostensibly better than random(). > The ChaCha20 generator we currently have in the tree may not be > required, true. But this doesn't make what we have today “broken”, it's > merely overly complicated. And replacing that with a straight buffer > from getrandom does not change the external interface, so we can do this > any time we want. Whether you use chacha20 in a fast key erasure construction, or you buffer lots of bytes of getrandom() that you overwrite with zeros as you use doesn't really matter in the sense that these are both just forms of buffering. With the chacha20 one, you're reseeding after 16 megs, but of course the state is smaller, but that doesn't matter. For purposes here, we may as well treat that as buffering 16 megs of getrandom() output. My concern with this buffering is that userspace doesn't know when to invalidate the buffer. So a userspace that's using arc4random() for crypto will potentially be missing something *important* that a userspace who used getrandom() instead would have. When I brought this up with Adhemerval, his reply was that it doesn't matter anyway because arc4random() is going to be documented as not for cryptography. So it sounded like the author of it finds it worse too. So yikes. The whole point is that you shouldn't ship something sensitive that is worse than what it will potentially replace, right out of the gate. Slow down and get the thing right, and then ship it. > Not completely, no, but we can cover many cases. I do not currently see > a way around that if we want to promote arc4random_uniform(limit) as a > replacement for random() % limit. I agree that the rejection sampling is the most useful function being added. Let's say, just for the sake of argument, that you instead added `getrandom_u{64,32,16,8}_uniform(u_type limit, unsigned long flags)` that expanded to doing `getrandom(&integer, flags)` and then rejection sampling on that in a loop like usual. It wouldn't be super great, so the first optimization would be to observe that the cost of 32 bytes and the cost of 4 bytes is the same, so you just grab 32 bytes at a time, which basically guarantees you'll get a good number when rejection sampling. Alright, fine, but then maybe you want to use it for shuffling, and then we have your syscall overhead measurements. But that's where the vDSO approach comes into play for making it fast. Old systems would have something work that's still safe. New systems would have something work that's safe and fast. Nobody gets something less safe. (As a sidenote, notice how my hypothetical API gives larger types than arc4random_uniform's fixed u32, just sayin'.) Now, spitballing new APIs is kind of besides the point here, as there are 100 different ways to bikeshed that, but what I'm trying to suggest is that there's a way of adding what you want to libc without reducing the quality of it for users, right from the beginning. So why not start out conservatively? Or, if you insist on providing these functions t o d a y, and won't heed my warnings about designing the APIs alongside the implementations, then just make them thin wrappers over getrandom(0) *without* doing fancy buffering, and then optimizations later can improve it. That would be the incremental approach, which wouldn't harm potential users. It also wouldn't shut the door on doing the buffering: if the kernel optimization improvements go nowhere, and you decide it's a lost cause, you can always change the way it works later, and make that decision then. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 13:43 ` Jason A. Donenfeld @ 2022-07-25 13:58 ` Cristian Rodríguez 2022-07-25 16:06 ` Rich Felker 1 sibling, 0 replies; 81+ messages in thread From: Cristian Rodríguez @ 2022-07-25 13:58 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Florian Weimer, Yann Droneaud, jann, Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael On Mon, Jul 25, 2022 at 9:44 AM Jason A. Donenfeld via Libc-alpha <libc-alpha@sourceware.org> wrote: > Or, if you insist on providing these functions t o d a y, and won't heed > my warnings about designing the APIs alongside the implementations, then > just make them thin wrappers over getrandom(0) *without* doing fancy > buffering, and then optimizations later can improve it. That would be > the incremental approach, which wouldn't harm potential users. It also > wouldn't shut the door on doing the buffering: if the kernel > optimization improvements go nowhere, and you decide it's a lost cause, > you can always change the way it works later, and make that decision > then. My 2CLP here if that matters..I agree with this sentiment/approach. provide this functions for source compat which all juist call getrandom and abort on failure *for now* and then a future iteration can have something done about the syscall overhead with kernel help. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 13:43 ` Jason A. Donenfeld 2022-07-25 13:58 ` Cristian Rodríguez @ 2022-07-25 16:06 ` Rich Felker 2022-07-25 16:43 ` Florian Weimer 1 sibling, 1 reply; 81+ messages in thread From: Rich Felker @ 2022-07-25 16:06 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Florian Weimer, Yann Droneaud, jann, Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael On Mon, Jul 25, 2022 at 03:43:57PM +0200, Jason A. Donenfeld via Libc-alpha wrote: > Hi Florian, > > On Mon, Jul 25, 2022 at 02:39:24PM +0200, Florian Weimer wrote: > > Below you suggest to use GRND_INSECURE to avoid deadlocks during > > booting. It's documented in the UAPI header as “Return > > non-cryptographic random bytes”. I assume it's broadly equivalent to > > reading from /dev/urandom (which we need to support for backwards > > compatibility, and currently use to avoid blocking). This means that we > > cannot really document the resulting bits as cryptographically strong > > from an application perspective because the kernel is not willing to > > make this commitment. > > Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but > > as I wrote above, it's UAPI documentation is a bit scary. Maybe it > > would be possible to clarify this in the manual pages a bit? I *assume* > > that if we are willing to read from /dev/urandom, we can use > > GRND_INSECURE right away to avoid that fallback path on sufficiently new > > kernels. But it would be nice to have confirmation. > > getrandom(GRND_INSECURE) is the same as getrandom(0), except before the > RNG is seeded, in which case the former will return ~garbage randomness > while the latter will block. The only current difference between > getrandom(GRND_INSECURE) and /dev/urandom is the latter will try for a > second to do the jitter entropy thing if the RNG isn't seeded yet. > > I agree that the documentation around this is really bad. Actually, so > much of the documentation is out of date or confusing. Thanks for the > kick on this: I really do need to rewrite that / clean it up. > > So with my random.c maintainer hat on: getrandom(GRND_INSECURE) will > return the same "quality" randomness as getrandom(0), except before > the RNG is initialized. I'll fix up the docs for that, but feel free to > refer to this statement ahead of that if you need. > > Code-wise, the only relevant branch related to GRND_INSECURE is: > > if (!crng_ready() && !(flags & GRND_INSECURE)) { > if (flags & GRND_NONBLOCK) > return -EAGAIN; > ret = wait_for_random_bytes(); > if (unlikely(ret)) > return ret; > } > > That means: if it's not ready, and you didn't pass _INSECURE, and you > didn't pass _NONBLOCK, then wait for the RNG to be ready, and error out > if that's interrupted by a signal. Other than that one block, it > continues on to do the same thing as getrandom(0). > > With that said, however, I think it'd be nice if you used only blocking > randomness, and shove the initialization problem at init systems and > bootloaders and such. In 5.20, for example, there'll be an x86 boot > protocol for GRUB and kexec and hypervisors and such to pass a seed, and > since a long time, there exists a device tree attribute for the same. > Proliferating "unsafe" /dev/urandom-style usage doesn't seem good for > the ecosystem at large. And I'm in general interest in seeing progress > on decades long initialization-time seeding concerns. arc4random's contract is supposed to be that it always succeeds and always produces cryptographic output. It cannot use GRND_INSECURE or other insecure fallback methods to avoid blocking. It has to block. This function (inherently, in its contract) is not usable for early boot stuff where one is pretending to want actual cryptographic entropy but is just as happy getting some "high quality" non-CS stuff, and thereby would be just as happy with rand() or likely even with "42". Programs that will run in that context on Linux need to be explicitly aware of the messy "early boot" situation and figure out how they're going to handle it securely or if they even wanted CS randomness to begin with. Fortunately virtually nothing has to do that. On most (non-embedded) systems, init can just bring up a rw filesystem with saved entropy on it early and load that, then provide a fully-working environment to programs it invokes. > > I had some patches with AT_RANDOM fallback, including overwriting > > AT_RANDOM with output from the seeded PRNG. It's certainly messy. I > > probably didn't bother to post these patches given how bizarre the whole > > thing was. I did have fallback to CPU instructions, but that turned out > > to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware, > > unclear). > > Yea, it's kind of tricky as other things might be using AT_RANDOM also > and then you have a whole race issue and domain separation and whatnot. > The thing in systemd isn't really good for crypto -- no forward secrecy > and such -- but it's ostensibly better than random(). AT_RANDOM is unusable as a fallback here because it's equivalent to GRND_INSECURE. It's silently broken at early boot time. In musl we're likely going to end up using the legacy SYS_sysctl on pre-getrandom kernels even though it spammed syslog just because it seems to be the only way to get blocking secure entropy on those kernels. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 16:06 ` Rich Felker @ 2022-07-25 16:43 ` Florian Weimer 0 siblings, 0 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-25 16:43 UTC (permalink / raw) To: Rich Felker Cc: Jason A. Donenfeld, Yann Droneaud, jann, Jason A. Donenfeld via Libc-alpha, linux-crypto, Michael * Rich Felker: > AT_RANDOM is unusable as a fallback here because it's equivalent to > GRND_INSECURE. It's silently broken at early boot time. In musl we're > likely going to end up using the legacy SYS_sysctl on pre-getrandom > kernels even though it spammed syslog just because it seems to be the > only way to get blocking secure entropy on those kernels. Even pre-getrandom, sysctl was rarely enabled in kernel configurations if I recall correctly. I doubt it is an option to avoid process termination with old kernels/seccomp filters. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) 2022-07-25 12:39 ` Florian Weimer 2022-07-25 13:43 ` Jason A. Donenfeld @ 2022-07-26 14:27 ` Yann Droneaud 2022-07-26 14:35 ` arc4random - are you sure we want these? Yann Droneaud 2 siblings, 0 replies; 81+ messages in thread From: Yann Droneaud @ 2022-07-26 14:27 UTC (permalink / raw) To: Florian Weimer, Jason A. Donenfeld via Libc-alpha Cc: Jason A. Donenfeld, Yann Droneaud, Michael, linux-crypto, jann, dalias Hi, Le 25/07/2022 à 14:39, Florian Weimer a écrit : > * Jason A. Donenfeld via Libc-alpha: >> (After all, I didn't see any wild-n-crazy fallback >> to AT_RANDOM like what systemd does with random-util.c: >> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c ) > I had some patches with AT_RANDOM fallback, including overwriting > AT_RANDOM with output from the seeded PRNG. It's certainly messy. I > probably didn't bother to post these patches given how bizarre the whole > thing was. It's not that bizarre as I have some patches too: I tried to harden the way stack_chk_guard and pointer_chk_guard were computed. Those values are currently generated from slices of AT_RANDOM by the loader. But I've seen in the wild program reusing AT_RANDOM, thus possibily leaking stack_chk_guard and pointer_chk_guard values. Having a proper (CS)PRNG in the loader, initialized from AT_RANDOM, that overwrites AT_RANDOM (with fresh entropy if possible) after initialization, would improve programs abusing AT_RANDOM purpose. Regards. -- Yann Droneaud OPTEYA ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 12:39 ` Florian Weimer 2022-07-25 13:43 ` Jason A. Donenfeld 2022-07-26 14:27 ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud @ 2022-07-26 14:35 ` Yann Droneaud 2 siblings, 0 replies; 81+ messages in thread From: Yann Droneaud @ 2022-07-26 14:35 UTC (permalink / raw) To: Florian Weimer, Jason A. Donenfeld via Libc-alpha Cc: Jason A. Donenfeld, Michael, linux-crypto, jann Hi, Le 25/07/2022 à 14:39, Florian Weimer a écrit : > * Jason A. Donenfeld via Libc-alpha: >>> The performance numbers suggest that we benefit from buffering in user >>> space. >> The question is whether it's safe and advisable to buffer this way in >> userspace. Does userspace have the right information now of when to >> discard the buffer and get a new one? I suspect it does not. > Not completely, no, but we can cover many cases. I do not currently see > a way around that if we want to promote arc4random_uniform(limit) as a > replacement for random() % limit. +1 That the reason I've reviewed the implementation positively: for me arc4random is not about generating secret keys but small integers. I want to be able to divert developers from srand(time(NULL)) identifier = rand() % 33 to identifier = arc4random_uniform(33) Safe, fast, and reasonably secure. Regards. -- Yann Droneaud OPTEYA ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 11:04 ` Jason A. Donenfeld 2022-07-25 12:39 ` Florian Weimer @ 2022-07-25 13:25 ` Jeffrey Walton 2022-07-25 13:48 ` Jason A. Donenfeld 1 sibling, 1 reply; 81+ messages in thread From: Jeffrey Walton @ 2022-07-25 13:25 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: Linux Crypto Mailing List On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > ... > > The performance numbers suggest that we benefit from buffering in user > > space. > > The question is whether it's safe and advisable to buffer this way in > userspace. Does userspace have the right information now of when to > discard the buffer and get a new one? I suspect it does not. I _think_ the sharp edge on userspace buffering is generator state. Most generator threat models I have seen assume the attacker does not know the generator's state. If buffering occurs in the application, then it may be easier for an attacker to learn of the generator's state. If buffering occurs in the kernel, then generator state should be private from an userspace application's view. Jeff ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 13:25 ` Jeffrey Walton @ 2022-07-25 13:48 ` Jason A. Donenfeld 0 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 13:48 UTC (permalink / raw) To: Jeffrey Walton; +Cc: Linux Crypto Mailing List, libc-alpha Hi Jeffrey, Please keep libc-alpha@sourceware.org CC'd. On Mon, Jul 25, 2022 at 09:25:58AM -0400, Jeffrey Walton wrote: > On Mon, Jul 25, 2022 at 7:08 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > ... > > > The performance numbers suggest that we benefit from buffering in user > > > space. > > > > The question is whether it's safe and advisable to buffer this way in > > userspace. Does userspace have the right information now of when to > > discard the buffer and get a new one? I suspect it does not. > > I _think_ the sharp edge on userspace buffering is generator state. > Most generator threat models I have seen assume the attacker does not > know the generator's state. If buffering occurs in the application, > then it may be easier for an attacker to learn of the generator's > state. If buffering occurs in the kernel, then generator state should > be private from an userspace application's view. I guess that's one concern, if you're worried about heartbleed-like attacks, in which an undetected RNG state compromise might be easier to pull off. What I have in mind, though, are the various triggers and heuristics that the kernel uses for when it needs to reseed. These userspace doesn't know about. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: arc4random - are you sure we want these? 2022-07-25 10:11 ` Florian Weimer 2022-07-25 11:04 ` Jason A. Donenfeld @ 2022-07-25 14:56 ` Rich Felker 1 sibling, 0 replies; 81+ messages in thread From: Rich Felker @ 2022-07-25 14:56 UTC (permalink / raw) To: Florian Weimer Cc: Jason A. Donenfeld via Libc-alpha, Yann Droneaud, jann, Jason A. Donenfeld, Michael On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer via Libc-alpha wrote: > * Jason A. Donenfeld via Libc-alpha: > > > I really wonder whether this is a good idea, whether this is something > > that glibc wants, and whether it's a design worth committing to in the > > long term. > > Do you object to the interface, or the implementation? That was *exactly* my first question too. > The implementation can be improved easily enough at a later date. > > > Firstly, for what use cases does this actually help? As of recent > > changes to the Linux kernels -- now backported all the way to 4.9! -- > > getrandom() and /dev/urandom are extremely fast and operate over per-cpu > > states locklessly. Sure you avoid a syscall by doing that in userspace, > > but does it really matter? Who exactly benefits from this? > > getrandom may be fast for bulk generation. It's not that great for > generating a few bits here and there. For example, shuffling a > 1,000-element array takes 18 microseconds with arc4random_uniform in > glibc, and 255 microseconds with the naïve getrandom-based > implementation (with slightly biased results; measured on an Intel > i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64). > > > You miss out on this with arc4random, and if that information _is_ to be > > exported to userspace somehow in the future, it would be awfully nice to > > design the userspace interface alongside the kernel one. > > What is the kernel interface you are talking about? From an interface > standpoint, arc4random_buf and getrandom are very similar, with the main > difference is that arc4random_buf cannot report failure (except by > terminating the process). > > > Seen from this perspective, going with OpenBSD's older paradigm might be > > rather limiting. Why not work together, between the kernel and libc, to > > see if we can come up with something better, before settling on an > > interface with semantics that are hard to walk back later? > > Historically, kernel developers were not interested in solving some of > the hard problems (especially early seeding) that prevent the use of > getrandom during early userspace stages. > > > As-is, it's hard to recommend that anybody really use these functions. > > Just keep using getrandom(2), which has mostly favorable semantics. > > Some applications still need to run in configurations where getrandom is > not available (either because the kernel is too old, or because it has > been disabled via seccomp). > > > Yes, I get it: it's fun to make a random number generator, and so lots > > of projects figure out some way to make yet another one somewhere > > somehow. But the tendency to do so feels like a weird computer tinkerer > > disease rather something that has ever helped the overall ecosystem. > > The performance numbers suggest that we benefit from buffering in user > space. It might not be necessary to implement expansion in userspace. > getrandom (or /dev/urandom) with a moderately-sized buffer could be > sufficient. FWIW I'd rather have a few kB of shareable entropy-expansion .text in userspace than a few kB per process (or even per thread? >_<) of nonshareable data any day. > But that's an implementation detail, and something we can revisit later. > If we vDSO acceleration for getrandom (maybe using the userspace > thread-specific data donation we discussed for rseq), we might > eventually do way with the buffering in glibc. Again this is an > implementation detail we can change easily enough. Exactly. FWIW I've been kinda waiting to see what glibc would do on this after the posix_random proposal failed, before considering much what we should do in musl, but the value I see in either is not as an optimization but as honoring a well-known interface so we have fewer applications doing their own stupid YOLO stuff trying to get secure entropy and botching it. So far the best we have is getentropy but it fails on old kernels. At some point musl will probably implement both arc4random and getentropy with secure fallback process for old kernels -- certainly the fallback is needed for meeting the arc4random contract and I'd like it on both places. Rich ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH] arc4random: simplify design for better safety 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld ` (3 preceding siblings ...) 2022-07-25 10:11 ` Florian Weimer @ 2022-07-25 22:57 ` Jason A. Donenfeld 2022-07-25 23:11 ` Jason A. Donenfeld ` (2 more replies) 4 siblings, 3 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 22:57 UTC (permalink / raw) To: libc-alpha Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, but it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - include/stdlib.h | 3 - stdlib/Makefile | 2 - stdlib/arc4random.c | 205 ++----- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/chacha20_arch.h | 24 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 28 files changed, 52 insertions(+), 2663 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/generic/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..23a4167987 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,61 +16,14 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> #include <stdlib.h> +#include <sys/poll.h> #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +31,67 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); - - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} + static bool have_getrandom = true, seen_initialized = false; + int fd; -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do - { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; - } - while (p < end); - - if (__close_nocancel (fd) == 0) - return; + ssize_t l; + + if (!have_getrandom) + break; + + l = __getrandom_nocancel (p, n, 0); + if (l > 0) + { + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ + } + else if (l == 0) + arc4random_getrandom_failure (); /* Weird, should never happen. */ + else if (errno == ENOSYS) + { + have_getrandom = false; + break; /* No syscall, so fallback to /dev/urandom. */ + } + arc4random_getrandom_failure (); /* Unknown other error, should never happen. */ } - arc4random_getrandom_failure (); -} -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) + if (!seen_initialized) { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (__poll(&pfd, 1, -1) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel(pfd.fd) < 0) + arc4random_getrandom_failure (); + seen_initialized = true; } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (buffer, len); - return; - } - - arc4random_check_stir (state, len); - while (len > 0) + fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY); + if (fd < 0) + arc4random_getrandom_failure (); + while (n) { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + p = (uint8_t *) p + l; + n -= l; } + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +100,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf(&r, sizeof(r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h deleted file mode 100644 index 1b4559ccbc..0000000000 --- a/sysdeps/generic/chacha20_arch.h +++ /dev/null @@ -1,24 +0,0 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] arc4random: simplify design for better safety 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld @ 2022-07-25 23:11 ` Jason A. Donenfeld 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld 2 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 23:11 UTC (permalink / raw) To: libc-alpha Cc: Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto If you're just following along on the mailing list, without actively trying to apply this to a glibc tree, that diff might be hard to read. The meat of it is the below function implementation. Notably this is basically the same as systemd's crypto_random_bytes() (which I recently rewrote there). void __arc4random_buf (void *p, size_t n) { static bool have_getrandom = true, seen_initialized = false; int fd; if (n == 0) return; for (;;) { ssize_t l; if (!have_getrandom) break; l = __getrandom_nocancel (p, n, 0); if (l > 0) { if ((size_t) l == n) return; /* Done reading, success. */ p = (uint8_t *) p + l; n -= l; continue; /* Interrupted by a signal; keep going. */ } else if (l == 0) arc4random_getrandom_failure (); /* Weird, should never happen. */ else if (errno == ENOSYS) { have_getrandom = false; break; /* No syscall, so fallback to /dev/urandom. */ } arc4random_getrandom_failure (); /* Unknown other error, should never happen. */ } if (!seen_initialized) { struct pollfd pfd = { .events = POLLIN }; pfd.fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); if (pfd.fd < 0) arc4random_getrandom_failure (); if (__poll(&pfd, 1, -1) < 0) arc4random_getrandom_failure (); if (__close_nocancel(pfd.fd) < 0) arc4random_getrandom_failure (); seen_initialized = true; } fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY); if (fd < 0) arc4random_getrandom_failure (); while (n) { ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); if (l <= 0) arc4random_getrandom_failure (); p = (uint8_t *) p + l; n -= l; } if (__close_nocancel (fd) < 0) arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v2] arc4random: simplify design for better safety 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld 2022-07-25 23:11 ` Jason A. Donenfeld @ 2022-07-25 23:28 ` Jason A. Donenfeld 2022-07-25 23:59 ` Eric Biggers ` (3 more replies) 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld 2 siblings, 4 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-25 23:28 UTC (permalink / raw) To: libc-alpha Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, but it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - include/stdlib.h | 3 - stdlib/Makefile | 2 - stdlib/arc4random.c | 204 ++----- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/chacha20_arch.h | 24 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 28 files changed, 53 insertions(+), 2661 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/generic/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..80c55cde63 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,61 +16,14 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> #include <stdlib.h> +#include <sys/poll.h> #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +31,70 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); + static bool have_getrandom = true, seen_initialized = false; + int fd; - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} - -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do - { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; - } - while (p < end); + ssize_t l; - if (__close_nocancel (fd) == 0) - return; - } - arc4random_getrandom_failure (); -} + if (!have_getrandom) + break; -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) - { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + l = __getrandom_nocancel (p, n, 0); + if (l > 0) + { + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ + } + else if (l == 0) + arc4random_getrandom_failure (); /* Weird, should never happen. */ + else if (errno == ENOSYS) + { + have_getrandom = false; + break; /* No syscall, so fallback to /dev/urandom. */ + } + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) + if (!seen_initialized) { - arc4random_getentropy (buffer, len); - return; + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (__poll (&pfd, 1, -1) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel (pfd.fd) < 0) + arc4random_getrandom_failure (); + seen_initialized = true; } - arc4random_check_stir (state, len); - while (len > 0) + fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (fd < 0) + arc4random_getrandom_failure (); + do { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + p = (uint8_t *) p + l; + n -= l; } + while (n); + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +103,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf (&r, sizeof (r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h deleted file mode 100644 index 1b4559ccbc..0000000000 --- a/sysdeps/generic/chacha20_arch.h +++ /dev/null @@ -1,24 +0,0 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld @ 2022-07-25 23:59 ` Eric Biggers 2022-07-26 10:26 ` Jason A. Donenfeld 2022-07-26 1:10 ` Mark Harris ` (2 subsequent siblings) 3 siblings, 1 reply; 81+ messages in thread From: Eric Biggers @ 2022-07-25 23:59 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto On Tue, Jul 26, 2022 at 01:28:10AM +0200, Jason A. Donenfeld wrote: > Rather than buffering 16 MiB of entropy in userspace (by way of > chacha20), simply call getrandom() every time. > > This approach is doubtlessly slower, for now, but trying to prematurely > optimize arc4random appears to be leading toward all sorts of nasty > properties and gotchas. Instead, this patch takes a much more > conservative approach. The interface is added as a basic loop wrapper > around getrandom(), and then later, the kernel and libc together can > work together on optimizing that. > > This prevents numerous issues in which userspace is unaware of when it > really must throw away its buffer, since we avoid buffering all > together. Future improvements may include userspace learning more from > the kernel about when to do that, which might make these sorts of > chacha20-based optimizations more possible. The current heuristic of 16 > MiB is meaningless garbage that doesn't correspond to anything the > kernel might know about. So for now, let's just do something > conservative that we know is correct and won't lead to cryptographic > issues for users of this function. > > This patch might be considered along the lines of, "optimization is the > root of all evil," in that the much more complex implementation it > replaces moves too fast without considering security implications, > whereas the incremental approach done here is a much safer way of going > about things. Once this lands, we can take our time in optimizing this > properly using new interplay between the kernel and userspace. > > getrandom(0) is used, since that's the one that ensures the bytes > returned are cryptographically secure. But on systems without it, we > fallback to using /dev/urandom. This is unfortunate because it means > opening a file descriptor, but there's not much of a choice. Secondly, > as part of the fallback, in order to get more or less the same > properties of getrandom(0), we poll on /dev/random, and if the poll > succeeds at least once, then we assume the RNG is initialized. This is a > rough approximation, as the ancient "non-blocking pool" initialized > after the "blocking pool", not before, but it's the best approximation > we can do. > > The motivation for including arc4random, in the first place, is to have > source-level compatibility with existing code. That means this patch > doesn't attempt to litigate the interface itself. It does, however, > choose a conservative approach for implementing it. > > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Cristian Rodríguez <crrodriguez@opensuse.org> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: linux-crypto@vger.kernel.org > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> This looks good to me. There are still a few bits that need to be removed/updated. With a quick grep, I found: sysdeps/generic/tls-internal-struct.h: struct arc4random_state_t *rand_state; sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork. * NEWS: ... The functions use a pseudo-random number generator along with NEWS: entropy from the kernel. Also, the documentation in manual/math.texi should say that the randomness is cryptographically secure. - Eric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-25 23:59 ` Eric Biggers @ 2022-07-26 10:26 ` Jason A. Donenfeld 0 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 10:26 UTC (permalink / raw) To: Eric Biggers Cc: libc-alpha, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Hi Eric, On Mon, Jul 25, 2022 at 04:59:17PM -0700, Eric Biggers wrote: > This looks good to me. > > There are still a few bits that need to be removed/updated. With a quick grep, > I found: > > sysdeps/generic/tls-internal-struct.h: struct arc4random_state_t *rand_state; > > sysdeps/unix/sysv/linux/tls-internal.h:/* Reset the arc4random TCB state on fork. * > > NEWS: ... The functions use a pseudo-random number generator along with > NEWS: entropy from the kernel. > > > Also, the documentation in manual/math.texi should say that the randomness is > cryptographically secure. Thanks for the notes. I'll clean that all up in v3. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld 2022-07-25 23:59 ` Eric Biggers @ 2022-07-26 1:10 ` Mark Harris 2022-07-26 10:41 ` Jason A. Donenfeld 2022-07-26 9:55 ` Florian Weimer 2022-07-26 11:33 ` Adhemerval Zanella Netto 3 siblings, 1 reply; 81+ messages in thread From: Mark Harris @ 2022-07-26 1:10 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: libc-alpha, Florian Weimer, linux-crypto Jason A. Donenfeld wrote: > + l = __getrandom_nocancel (p, n, 0); > + if (l > 0) > + { > + if ((size_t) l == n) > + return; /* Done reading, success. */ > + p = (uint8_t *) p + l; > + n -= l; > + continue; /* Interrupted by a signal; keep going. */ > + } > + else if (l == 0) > + arc4random_getrandom_failure (); /* Weird, should never happen. */ > + else if (errno == ENOSYS) > + { > + have_getrandom = false; > + break; /* No syscall, so fallback to /dev/urandom. */ > + } > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ Isn't EINTR also possible? Aborting in that case does not seem reasonable. Also the __getrandom_nocancel function does not set errno on Linux; it just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags). So unless that is changed, it doesn't look like this ENOSYS check will detect old Linux kernels. > + struct pollfd pfd = { .events = POLLIN }; > + pfd.fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (pfd.fd < 0) > + arc4random_getrandom_failure (); > + if (__poll (&pfd, 1, -1) < 0) > + arc4random_getrandom_failure (); > + if (__close_nocancel (pfd.fd) < 0) > + arc4random_getrandom_failure (); The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also result in EINTR. - Mark ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 1:10 ` Mark Harris @ 2022-07-26 10:41 ` Jason A. Donenfeld 2022-07-26 11:06 ` Florian Weimer 2022-07-26 16:51 ` Mark Harris 0 siblings, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 10:41 UTC (permalink / raw) To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto Hi Mark, On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote: > Jason A. Donenfeld wrote: > > + l = __getrandom_nocancel (p, n, 0); > > + if (l > 0) > > + { > > + if ((size_t) l == n) > > + return; /* Done reading, success. */ > > + p = (uint8_t *) p + l; > > + n -= l; > > + continue; /* Interrupted by a signal; keep going. */ > > + } > > + else if (l == 0) > > + arc4random_getrandom_failure (); /* Weird, should never happen. */ > > + else if (errno == ENOSYS) > > + { > > + have_getrandom = false; > > + break; /* No syscall, so fallback to /dev/urandom. */ > > + } > > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ > > Isn't EINTR also possible? Aborting in that case does not seem reasonable. Not in current kernels, where it always returns at least PAGE_SIZE bytes before checking for pending signals. In older kernels, if there was a signal pending at the top, it would do no work and return -ERESTARTSYS, which I believe should then get restarted by glibc's syscaller? I might be wrong about how restarts work though, so if you know better, please let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we want. I guess I can just add a case for it. > Also the __getrandom_nocancel function does not set errno on Linux; it > just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags). > So unless that is changed, it doesn't look like this ENOSYS check will > detect old Linux kernels. Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is as a return value, right? I'll adjust the code to account for that. > > + struct pollfd pfd = { .events = POLLIN }; > > + pfd.fd = TEMP_FAILURE_RETRY ( > > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > > + if (pfd.fd < 0) > > + arc4random_getrandom_failure (); > > + if (__poll (&pfd, 1, -1) < 0) > > + arc4random_getrandom_failure (); > > + if (__close_nocancel (pfd.fd) < 0) > > + arc4random_getrandom_failure (); > > The TEMP_FAILURE_RETRY handles EINTR on open, but __poll can also > result in EINTR. Thanks. I'll surround the __poll in TEMP_FAILURE_RETRY. Thank you for the review! v3 will have the above changes. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 10:41 ` Jason A. Donenfeld @ 2022-07-26 11:06 ` Florian Weimer 2022-07-26 16:51 ` Mark Harris 1 sibling, 0 replies; 81+ messages in thread From: Florian Weimer @ 2022-07-26 11:06 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: Mark Harris, libc-alpha, linux-crypto * Jason A. Donenfeld: > Not in current kernels, where it always returns at least PAGE_SIZE bytes > before checking for pending signals. In older kernels, if there was a > signal pending at the top, it would do no work and return -ERESTARTSYS, > which I believe should then get restarted by glibc's syscaller? glibc does not handle ERESTARTSYS, it's a kernel-internal error code that's not exported in UAPI headers and must not leak to userspace (except perhaps via ptrace). I believe restarts are handled in the kernel signal code, by tweaking the program counter. Looking at that, ERESTARTSYS gets translated to EINTR for !SA_RESTART system calls: /* Are we from a system call? */ if (syscall_get_nr(current, regs) != -1) { /* If so, check system call restarting.. */ switch (syscall_get_error(current, regs)) { case -ERESTART_RESTARTBLOCK: case -ERESTARTNOHAND: regs->ax = -EINTR; break; case -ERESTARTSYS: if (!(ksig->ka.sa.sa_flags & SA_RESTART)) { regs->ax = -EINTR; break; } fallthrough; case -ERESTARTNOINTR: regs->ax = regs->orig_ax; regs->ip -= 2; break; } } (arch/x86/kernel/signal.c) Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 10:41 ` Jason A. Donenfeld 2022-07-26 11:06 ` Florian Weimer @ 2022-07-26 16:51 ` Mark Harris 2022-07-26 18:42 ` Jason A. Donenfeld 1 sibling, 1 reply; 81+ messages in thread From: Mark Harris @ 2022-07-26 16:51 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: libc-alpha, Florian Weimer, linux-crypto Jason A. Donenfeld wrote: > On Mon, Jul 25, 2022 at 06:10:06PM -0700, Mark Harris wrote: > > Jason A. Donenfeld wrote: > > > + l = __getrandom_nocancel (p, n, 0); > > > + if (l > 0) > > > + { > > > + if ((size_t) l == n) > > > + return; /* Done reading, success. */ > > > + p = (uint8_t *) p + l; > > > + n -= l; > > > + continue; /* Interrupted by a signal; keep going. */ > > > + } > > > + else if (l == 0) > > > + arc4random_getrandom_failure (); /* Weird, should never happen. */ > > > + else if (errno == ENOSYS) > > > + { > > > + have_getrandom = false; > > > + break; /* No syscall, so fallback to /dev/urandom. */ > > > + } > > > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ > > > > Isn't EINTR also possible? Aborting in that case does not seem reasonable. > > Not in current kernels, where it always returns at least PAGE_SIZE bytes > before checking for pending signals. In older kernels, if there was a > signal pending at the top, it would do no work and return -ERESTARTSYS, > which I believe should then get restarted by glibc's syscaller? I might > be wrong about how restarts work though, so if you know better, please > let me know. TEMP_FAILURE_RETRY relies on errno, so that's not what we > want. I guess I can just add a case for it. > > > Also the __getrandom_nocancel function does not set errno on Linux; it > > just returns INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags). > > So unless that is changed, it doesn't look like this ENOSYS check will > > detect old Linux kernels. > > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is > as a return value, right? I'll adjust the code to account for that. Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it gets from the Linux kernel, but only on Linux does __getrandom_nocancel use that. The Hurd and generic implementations set errno on error. Previously the only call to this function did not care about the specific error value so it didn't matter. Since you are now using the error value in generic code, __getrandom_nocancel should be changed on Linux to set errno like most other _nocancel calls, and then it should go back to checking errno here. And as Adhemerval mentioned, you only added a Linux implementation of __ppoll_infinity_nocancel, but are calling it from generic code. Also, by the way your patches cc'd directly to me get quarantined because DKIM signature verification failed. The non-patch messages pass DKIM and are fine. - Mark ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 16:51 ` Mark Harris @ 2022-07-26 18:42 ` Jason A. Donenfeld 2022-07-26 19:18 ` Adhemerval Zanella Netto 2022-07-26 19:24 ` Jason A. Donenfeld 0 siblings, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 18:42 UTC (permalink / raw) To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto Hi Mark, On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote: > > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is > > as a return value, right? I'll adjust the code to account for that. > > Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it > gets from the Linux kernel, but only on Linux does > __getrandom_nocancel use that. The Hurd and generic implementations > set errno on error. Previously the only call to this function did not > care about the specific error value so it didn't matter. Since you > are now using the error value in generic code, __getrandom_nocancel > should be changed on Linux to set errno like most other _nocancel > calls, and then it should go back to checking errno here. > > And as Adhemerval mentioned, you only added a Linux implementation of > __ppoll_infinity_nocancel, but are calling it from generic code. Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR. > Also, by the way your patches cc'd directly to me get quarantined > because DKIM signature verification failed. The non-patch messages > pass DKIM and are fine. That sure is odd. The emails are all going through the MTA. rspamd bug? OpenSMTPD bug? Hmm... Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 18:42 ` Jason A. Donenfeld @ 2022-07-26 19:18 ` Adhemerval Zanella Netto 2022-07-26 19:24 ` Jason A. Donenfeld 1 sibling, 0 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 19:18 UTC (permalink / raw) To: libc-alpha On 26/07/22 15:42, Jason A. Donenfeld via Libc-alpha wrote: > Hi Mark, > > On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote: >>> Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is >>> as a return value, right? I'll adjust the code to account for that. >> >> Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it >> gets from the Linux kernel, but only on Linux does >> __getrandom_nocancel use that. The Hurd and generic implementations >> set errno on error. Previously the only call to this function did not >> care about the specific error value so it didn't matter. Since you >> are now using the error value in generic code, __getrandom_nocancel >> should be changed on Linux to set errno like most other _nocancel >> calls, and then it should go back to checking errno here. >> >> And as Adhemerval mentioned, you only added a Linux implementation of >> __ppoll_infinity_nocancel, but are calling it from generic code. > > Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets > errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR. > >> Also, by the way your patches cc'd directly to me get quarantined >> because DKIM signature verification failed. The non-patch messages >> pass DKIM and are fine. > > That sure is odd. The emails are all going through the MTA. rspamd bug? > OpenSMTPD bug? Hmm... I am having a similar issue, where my company email server (which is google in the end) is marking your patches as spam. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 18:42 ` Jason A. Donenfeld 2022-07-26 19:18 ` Adhemerval Zanella Netto @ 2022-07-26 19:24 ` Jason A. Donenfeld 1 sibling, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 19:24 UTC (permalink / raw) To: Mark Harris; +Cc: libc-alpha, Florian Weimer, linux-crypto On Tue, Jul 26, 2022 at 08:42:51PM +0200, Jason A. Donenfeld wrote: > Hi Mark, > > On Tue, Jul 26, 2022 at 09:51:03AM -0700, Mark Harris wrote: > > > Thanks. It looks like INTERNAL_SYSCALL_CALL just returns the errno as-is > > > as a return value, right? I'll adjust the code to account for that. > > > > Yes INTERNAL_SYSCALL_CALL just returns the negated errno value that it > > gets from the Linux kernel, but only on Linux does > > __getrandom_nocancel use that. The Hurd and generic implementations > > set errno on error. Previously the only call to this function did not > > care about the specific error value so it didn't matter. Since you > > are now using the error value in generic code, __getrandom_nocancel > > should be changed on Linux to set errno like most other _nocancel > > calls, and then it should go back to checking errno here. > > > > And as Adhemerval mentioned, you only added a Linux implementation of > > __ppoll_infinity_nocancel, but are calling it from generic code. > > Okay, I'll switch this to use INLINE_SYSCALL_CALL, so that it sets > errno, and then will use the normal TEMP_FAILURE_RETRY macro for EINTR. > > > Also, by the way your patches cc'd directly to me get quarantined > > because DKIM signature verification failed. The non-patch messages > > pass DKIM and are fine. > > That sure is odd. The emails are all going through the MTA. rspamd bug? > OpenSMTPD bug? Hmm... It's because LICENSE has a ^L in it, which I guess doesn't go over well with OpenSMPTD or rspamd or kernel.org's smtp server or some combination thereof... I just posted v5, by the way, in case it's in your spam folder. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld 2022-07-25 23:59 ` Eric Biggers 2022-07-26 1:10 ` Mark Harris @ 2022-07-26 9:55 ` Florian Weimer 2022-07-26 11:04 ` Jason A. Donenfeld 2022-07-26 11:33 ` Adhemerval Zanella Netto 3 siblings, 1 reply; 81+ messages in thread From: Florian Weimer @ 2022-07-26 9:55 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez, Paul Eggert, linux-crypto * Jason A. Donenfeld: > + pfd.fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (pfd.fd < 0) > + arc4random_getrandom_failure (); > + if (__poll (&pfd, 1, -1) < 0) > + arc4random_getrandom_failure (); > + if (__close_nocancel (pfd.fd) < 0) > + arc4random_getrandom_failure (); What happens if /dev/random is actually /dev/urandom? Will the poll call fail? I think we need a no-cancel variant of poll here, and we also need to handle EINTR gracefully. Performance-wise, my 1000 element shuffle benchmark runs about 14 times slower without userspace buffering. (For comparison, just removing ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower than current master.) Our random() implementation is quite slow, so arc4random() as a replacement call is competitive. The unbuffered version, not so much. Running the benchmark, I see 40% of the time spent in chacha_permute in the kernel, that is really quite odd. Why doesn't the system call overhead dominate? Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 9:55 ` Florian Weimer @ 2022-07-26 11:04 ` Jason A. Donenfeld 2022-07-26 11:07 ` [PATCH v3] " Jason A. Donenfeld 2022-07-26 11:12 ` [PATCH v2] " Florian Weimer 0 siblings, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 11:04 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez, Paul Eggert, linux-crypto Hi Florian, On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote: > * Jason A. Donenfeld: > > > + pfd.fd = TEMP_FAILURE_RETRY ( > > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > > + if (pfd.fd < 0) > > + arc4random_getrandom_failure (); > > + if (__poll (&pfd, 1, -1) < 0) > > + arc4random_getrandom_failure (); > > + if (__close_nocancel (pfd.fd) < 0) > > + arc4random_getrandom_failure (); > > What happens if /dev/random is actually /dev/urandom? Will the poll > call fail? Yes. I'm unsure if you're asking this because it'd be a nice simplification to only have to open one fd, or because you're worried about confusion. I don't think the confusion problem is one we should take too seriously, but if you're concerned, we can always fstat and check the maj/min. Seems a bit much, though. > I think we need a no-cancel variant of poll here, and we also need to > handle EINTR gracefully. Thanks for the note about poll nocancel. I'll try to add this. I don't totally know how to manage that pluming, but I'll give it my best shot. > Performance-wise, my 1000 element shuffle benchmark runs about 14 times > slower without userspace buffering. (For comparison, just removing > ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower > than current master.) Our random() implementation is quite slow, so > arc4random() as a replacement call is competitive. The unbuffered > version, not so much. Yes, as mentioned, this is slower. But let's get something down first that's *correct*, and then after we can start optimizing it. Let's not prematurely optimize and create a problematic function that nobody should use. > Running the benchmark, I see 40% of the time spent in chacha_permute in > the kernel, that is really quite odd. Why doesn't the system call > overhead dominate? Huh, that is interesting. I guess if you're reading 4 bytes for an integer, it winds up computing a whole chacha block each time, with half of it doing fast key erasure and half of it being returnable to the caller. When we later figure out a safer way to buffer, ostensibly this will go away. But for now, we really should not prematurely optimize. I'll have v3 out shortly with your suggested fixes. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v3] arc4random: simplify design for better safety 2022-07-26 11:04 ` Jason A. Donenfeld @ 2022-07-26 11:07 ` Jason A. Donenfeld 2022-07-26 11:11 ` Jason A. Donenfeld 2022-07-26 11:12 ` [PATCH v2] " Florian Weimer 1 sibling, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 11:07 UTC (permalink / raw) To: libc-alpha Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, but it does to a decent swath of them, so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - NEWS | 4 +- include/stdlib.h | 3 - io/Versions | 1 + manual/math.texi | 13 +- stdlib/Makefile | 2 - stdlib/arc4random.c | 206 ++----- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/not-cancel.h | 2 + sysdeps/generic/tls-internal-struct.h | 1 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/Makefile | 3 +- sysdeps/unix/sysv/linux/Versions | 1 + sysdeps/unix/sysv/linux/not-cancel.h | 5 + .../sysv/linux/poll_nocancel.c} | 16 +- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/unix/sysv/linux/tls-internal.h | 1 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 37 files changed, 81 insertions(+), 2658 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/poll_nocancel.c} (68%) delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/NEWS b/NEWS index 8420a65cd0..fe531bfe1e 100644 --- a/NEWS +++ b/NEWS @@ -61,8 +61,8 @@ Major new features: is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). * The functions arc4random, arc4random_buf, and arc4random_uniform have been - added. The functions use a pseudo-random number generator along with - entropy from the kernel. + added. The functions wrap getrandom and/or /dev/urandom to return high- + quality randomness from the kernel. Deprecated and removed features, and other changes affecting compatibility: diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/io/Versions b/io/Versions index 4e19540885..b8660023e2 100644 --- a/io/Versions +++ b/io/Versions @@ -145,6 +145,7 @@ libc { __fcntl_nocancel; __open64_nocancel; __write_nocancel; + __poll_nocancel; __file_is_unchanged; __file_change_detection_for_stat; __file_change_detection_for_path; diff --git a/manual/math.texi b/manual/math.texi index 141695cc30..6d69bbff66 100644 --- a/manual/math.texi +++ b/manual/math.texi @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU extension, based on OpenBSD interfaces. @Theglibc{} uses kernel entropy obtained either through @code{getrandom} -or by reading @file{/dev/urandom} to seed and periodically re-seed the -internal state. A per-thread data pool is used, which allows fast output -generation. +or by reading @file{/dev/urandom} to seed. -Although these functions provide higher random quality than ISO, BSD, and -SVID functions, these still use a Pseudo-Random generator and should not -be used in cryptographic contexts. - -The internal state is cleared and reseeded with kernel entropy on @code{fork} -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall -or when using @theglibc{} @code{syscall} function. +These functions provide higher random quality than ISO, BSD, and SVID +functions, and may be used in cryptographic contexts. The prototypes for these functions are in @file{stdlib.h}. @pindex stdlib.h diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..ee49c7f551 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,61 +16,14 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> #include <stdlib.h> +#include <sys/poll.h> #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +31,72 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); + static bool have_getrandom = true, seen_initialized = false; + int fd; - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} - -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do - { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; - } - while (p < end); + ssize_t l; - if (__close_nocancel (fd) == 0) - return; - } - arc4random_getrandom_failure (); -} + if (!have_getrandom) + break; -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) - { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + l = __getrandom_nocancel (p, n, 0); + if (l > 0) + { + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ + } + else if (l == 0) + arc4random_getrandom_failure (); /* Weird, should never happen. */ + else if (l == -EINTR) + continue; /* Interrupted by a signal; keep going. */ + else if (l == -ENOSYS) + { + have_getrandom = false; + break; /* No syscall, so fallback to /dev/urandom. */ + } + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) + if (!seen_initialized) { - arc4random_getentropy (buffer, len); - return; + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (TEMP_FAILURE_RETRY (__poll_nocancel (&pfd, 1, -1)) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel (pfd.fd) < 0) + arc4random_getrandom_failure (); + seen_initialized = true; } - arc4random_check_stir (state, len); - while (len > 0) + fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (fd < 0) + arc4random_getrandom_failure (); + do { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + p = (uint8_t *) p + l; + n -= l; } + while (n); + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +105,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf (&r, sizeof (r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index acceb9b67f..bd60643599 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -50,5 +50,7 @@ __fcntl64 (fd, cmd, __VA_ARGS__) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __poll_nocancel(fd) \ + __poll (fd) #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h index a91915831b..d76c715a96 100644 --- a/sysdeps/generic/tls-internal-struct.h +++ b/sysdeps/generic/tls-internal-struct.h @@ -23,7 +23,6 @@ struct tls_internal_t { char *strsignal_buf; char *strerror_l_buf; - struct arc4random_state_t *rand_state; }; #endif diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 2ccc92b6b8..db28c65799 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \ open_nocancel open64_nocancel \ openat_nocancel openat64_nocancel \ read_nocancel pread64_nocancel \ - write_nocancel statx_cp stat_t64_cp + write_nocancel statx_cp stat_t64_cp \ + poll_nocancel sysdep_headers += bits/fcntl-linux.h diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions index 65d2ceda2c..04c3d37551 100644 --- a/sysdeps/unix/sysv/linux/Versions +++ b/sysdeps/unix/sysv/linux/Versions @@ -320,6 +320,7 @@ libc { __read_nocancel; __pread64_nocancel; __close_nocancel; + __poll_nocancel; __sigtimedwait; # functions used by nscd __netlink_assert_response; diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2c58d5ae2f..71361e7e96 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -23,6 +23,7 @@ #include <sysdep.h> #include <errno.h> #include <unistd.h> +#include <sys/poll.h> #include <sys/syscall.h> #include <sys/wait.h> #include <time.h> @@ -77,6 +78,9 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; +/* Uncancelable poll. */ +__typeof (__poll) __poll_nocancel; + #if IS_IN (libc) || IS_IN (rtld) hidden_proto (__open_nocancel) hidden_proto (__open64_nocancel) @@ -87,6 +91,7 @@ hidden_proto (__pread64_nocancel) hidden_proto (__write_nocancel) hidden_proto (__close_nocancel) hidden_proto (__fcntl64_nocancel) +hidden_proto (__poll_nocancel) #endif #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/poll_nocancel.c similarity index 68% rename from sysdeps/generic/chacha20_arch.h rename to sysdeps/unix/sysv/linux/poll_nocancel.c index 1b4559ccbc..462e6f8464 100644 --- a/sysdeps/generic/chacha20_arch.h +++ b/sysdeps/unix/sysv/linux/poll_nocancel.c @@ -1,5 +1,5 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. +/* Linux poll syscall implementation -- non-cancellable. + Copyright (C) 2018-2022 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -16,9 +16,13 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) +#include <unistd.h> +#include <sysdep-cancel.h> +#include <not-cancel.h> + +int +__poll_nocancel (struct pollfd *fds, nfds_t nfds, int timeout) { - chacha20_crypt_generic (state, dst, src, bytes); + return INLINE_SYSCALL_CALL (poll, fds, nfds, timeout); } +hidden_def (__poll_nocancel) diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h index ebc65d896a..2ebe977802 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.h +++ b/sysdeps/unix/sysv/linux/tls-internal.h @@ -28,7 +28,6 @@ __glibc_tls_internal (void) return &THREAD_SELF->tls_state; } -/* Reset the arc4random TCB state on fork. */ extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v3] arc4random: simplify design for better safety 2022-07-26 11:07 ` [PATCH v3] " Jason A. Donenfeld @ 2022-07-26 11:11 ` Jason A. Donenfeld 0 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 11:11 UTC (permalink / raw) To: libc-alpha Cc: Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto As before, I'll paste the main function in question standalone so that this is a bit easier to read for those not applying this to an actual tree. void __arc4random_buf (void *p, size_t n) { static bool have_getrandom = true, seen_initialized = false; int fd; if (n == 0) return; for (;;) { ssize_t l; if (!have_getrandom) break; l = __getrandom_nocancel (p, n, 0); if (l > 0) { if ((size_t) l == n) return; /* Done reading, success. */ p = (uint8_t *) p + l; n -= l; continue; /* Interrupted by a signal; keep going. */ } else if (l == 0) arc4random_getrandom_failure (); /* Weird, should never happen. */ else if (l == -EINTR) continue; /* Interrupted by a signal; keep going. */ else if (l == -ENOSYS) { have_getrandom = false; break; /* No syscall, so fallback to /dev/urandom. */ } arc4random_getrandom_failure (); /* Unknown error, should never happen. */ } if (!seen_initialized) { struct pollfd pfd = { .events = POLLIN }; pfd.fd = TEMP_FAILURE_RETRY ( __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); if (pfd.fd < 0) arc4random_getrandom_failure (); if (TEMP_FAILURE_RETRY (__poll_nocancel (&pfd, 1, -1)) < 0) arc4random_getrandom_failure (); if (__close_nocancel (pfd.fd) < 0) arc4random_getrandom_failure (); seen_initialized = true; } fd = TEMP_FAILURE_RETRY ( __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); if (fd < 0) arc4random_getrandom_failure (); do { ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); if (l <= 0) arc4random_getrandom_failure (); p = (uint8_t *) p + l; n -= l; } while (n); if (__close_nocancel (fd) < 0) arc4random_getrandom_failure (); } ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:04 ` Jason A. Donenfeld 2022-07-26 11:07 ` [PATCH v3] " Jason A. Donenfeld @ 2022-07-26 11:12 ` Florian Weimer 2022-07-26 11:20 ` Jason A. Donenfeld 1 sibling, 1 reply; 81+ messages in thread From: Florian Weimer @ 2022-07-26 11:12 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez, Paul Eggert, linux-crypto * Jason A. Donenfeld: > Hi Florian, > > On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote: >> * Jason A. Donenfeld: >> >> > + pfd.fd = TEMP_FAILURE_RETRY ( >> > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); >> > + if (pfd.fd < 0) >> > + arc4random_getrandom_failure (); >> > + if (__poll (&pfd, 1, -1) < 0) >> > + arc4random_getrandom_failure (); >> > + if (__close_nocancel (pfd.fd) < 0) >> > + arc4random_getrandom_failure (); >> >> What happens if /dev/random is actually /dev/urandom? Will the poll >> call fail? > > Yes. I'm unsure if you're asking this because it'd be a nice > simplification to only have to open one fd, or because you're worried > about confusion. I don't think the confusion problem is one we should > take too seriously, but if you're concerned, we can always fstat and > check the maj/min. Seems a bit much, though. Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used to be the only way to get some applications working because they tried to read from /dev/random at a higher rate than the system was estimating entropy coming in. We may have to do something differently here if the failing poll causes too much breakage. >> Running the benchmark, I see 40% of the time spent in chacha_permute in >> the kernel, that is really quite odd. Why doesn't the system call >> overhead dominate? > > Huh, that is interesting. I guess if you're reading 4 bytes for an > integer, it winds up computing a whole chacha block each time, with half > of it doing fast key erasure and half of it being returnable to the > caller. When we later figure out a safer way to buffer, ostensibly this > will go away. But for now, we really should not prematurely optimize. Yeah, I can't really argue against that, given that I said before that I wasn't too worried about the implementation. Thanks, Florian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:12 ` [PATCH v2] " Florian Weimer @ 2022-07-26 11:20 ` Jason A. Donenfeld 2022-07-26 11:35 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 11:20 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Adhemerval Zanella Netto, Cristian Rodríguez, Paul Eggert, linux-crypto Hey Florian, On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote: > >> What happens if /dev/random is actually /dev/urandom? Will the poll > >> call fail? > > > > Yes. I'm unsure if you're asking this because it'd be a nice > > simplification to only have to open one fd, or because you're worried > > about confusion. I don't think the confusion problem is one we should > > take too seriously, but if you're concerned, we can always fstat and > > check the maj/min. Seems a bit much, though. > > Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used > to be the only way to get some applications working because they tried > to read from /dev/random at a higher rate than the system was estimating > entropy coming in. We may have to do something differently here if the > failing poll causes too much breakage. The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail until it passes a certain threshold one time. This might also work on even older kernels than the poll() trick. But that's pretty darn ugly, so it's not obvious to me where the cut-off in frustration is, when we throw our hands up and decide the ugliness is worth it compared to whatever problems we happen to be facing at the time with the poll() technique. But at least there is an alternative, should we need it. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:20 ` Jason A. Donenfeld @ 2022-07-26 11:35 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 11:35 UTC (permalink / raw) To: Jason A. Donenfeld, Florian Weimer Cc: libc-alpha, Cristian Rodríguez, Paul Eggert, linux-crypto On 26/07/22 08:20, Jason A. Donenfeld wrote: > Hey Florian, > > On Tue, Jul 26, 2022 at 01:12:28PM +0200, Florian Weimer wrote: >>>> What happens if /dev/random is actually /dev/urandom? Will the poll >>>> call fail? >>> >>> Yes. I'm unsure if you're asking this because it'd be a nice >>> simplification to only have to open one fd, or because you're worried >>> about confusion. I don't think the confusion problem is one we should >>> take too seriously, but if you're concerned, we can always fstat and >>> check the maj/min. Seems a bit much, though. >> >> Turning /dev/random into /dev/urandom (e.g. with a symbolic link) used >> to be the only way to get some applications working because they tried >> to read from /dev/random at a higher rate than the system was estimating >> entropy coming in. We may have to do something differently here if the >> failing poll causes too much breakage. > > The "backup plan" would be to sleep-loop-read /proc/sys/kernel/random/entropy_avail > until it passes a certain threshold one time. This might also work on even older > kernels than the poll() trick. But that's pretty darn ugly, so it's not > obvious to me where the cut-off in frustration is, when we throw our > hands up and decide the ugliness is worth it compared to whatever > problems we happen to be facing at the time with the poll() technique. > But at least there is an alternative, should we need it. I think the poll trick is way better, although I also think it is very Linux specific. Should we move it to Linux sysdeps? The /proc/sys/kernel/random/entropy_avail would require to open another file descriptor, which I think we avoid for arc4random if possible. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld ` (2 preceding siblings ...) 2022-07-26 9:55 ` Florian Weimer @ 2022-07-26 11:33 ` Adhemerval Zanella Netto 2022-07-26 11:54 ` Jason A. Donenfeld 3 siblings, 1 reply; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 11:33 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto On 25/07/22 20:28, Jason A. Donenfeld wrote: > Rather than buffering 16 MiB of entropy in userspace (by way of > chacha20), simply call getrandom() every time. > > This approach is doubtlessly slower, for now, but trying to prematurely > optimize arc4random appears to be leading toward all sorts of nasty > properties and gotchas. Instead, this patch takes a much more > conservative approach. The interface is added as a basic loop wrapper > around getrandom(), and then later, the kernel and libc together can > work together on optimizing that. > > This prevents numerous issues in which userspace is unaware of when it > really must throw away its buffer, since we avoid buffering all > together. Future improvements may include userspace learning more from > the kernel about when to do that, which might make these sorts of > chacha20-based optimizations more possible. The current heuristic of 16 > MiB is meaningless garbage that doesn't correspond to anything the > kernel might know about. So for now, let's just do something > conservative that we know is correct and won't lead to cryptographic > issues for users of this function. > > This patch might be considered along the lines of, "optimization is the > root of all evil," in that the much more complex implementation it > replaces moves too fast without considering security implications, > whereas the incremental approach done here is a much safer way of going > about things. Once this lands, we can take our time in optimizing this > properly using new interplay between the kernel and userspace. > > getrandom(0) is used, since that's the one that ensures the bytes > returned are cryptographically secure. But on systems without it, we > fallback to using /dev/urandom. This is unfortunate because it means > opening a file descriptor, but there's not much of a choice. Secondly, > as part of the fallback, in order to get more or less the same > properties of getrandom(0), we poll on /dev/random, and if the poll > succeeds at least once, then we assume the RNG is initialized. This is a > rough approximation, as the ancient "non-blocking pool" initialized > after the "blocking pool", not before, but it's the best approximation > we can do. > > The motivation for including arc4random, in the first place, is to have > source-level compatibility with existing code. That means this patch > doesn't attempt to litigate the interface itself. It does, however, > choose a conservative approach for implementing it. > > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Cristian Rodríguez <crrodriguez@opensuse.org> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: linux-crypto@vger.kernel.org > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment, sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build), maybe also change the NEWS to state this is not a CSPRNG, and we definitely need to update the manual. Some comments below. > --- > LICENSES | 23 - > include/stdlib.h | 3 - > stdlib/Makefile | 2 - > stdlib/arc4random.c | 204 ++----- > stdlib/arc4random.h | 48 -- > stdlib/chacha20.c | 191 ------ > stdlib/tst-arc4random-chacha20.c | 167 ----- > sysdeps/aarch64/Makefile | 4 - > sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- > sysdeps/aarch64/chacha20_arch.h | 40 -- > sysdeps/generic/chacha20_arch.h | 24 - > sysdeps/generic/tls-internal.c | 10 - > sysdeps/mach/hurd/_Fork.c | 2 - > sysdeps/nptl/_Fork.c | 2 - > .../powerpc/powerpc64/be/multiarch/Makefile | 4 - > .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - > .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- > sysdeps/powerpc/powerpc64/power8/Makefile | 5 - > .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- > .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- > sysdeps/s390/s390-64/Makefile | 6 - > sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ > sysdeps/s390/s390-64/chacha20_arch.h | 45 -- > sysdeps/unix/sysv/linux/tls-internal.c | 10 - > sysdeps/x86_64/Makefile | 7 - > sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- > sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- > sysdeps/x86_64/chacha20_arch.h | 55 -- > 28 files changed, 53 insertions(+), 2661 deletions(-) > delete mode 100644 stdlib/arc4random.h > delete mode 100644 stdlib/chacha20.c > delete mode 100644 stdlib/tst-arc4random-chacha20.c > delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S > delete mode 100644 sysdeps/aarch64/chacha20_arch.h > delete mode 100644 sysdeps/generic/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S > delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h > delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S > delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S > delete mode 100644 sysdeps/x86_64/chacha20_arch.h > > diff --git a/LICENSES b/LICENSES > index cd04fb6e84..530893b1dc 100644 > --- a/LICENSES > +++ b/LICENSES > @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> > You should have received a copy of the GNU Lesser General Public > License along with this library; if not, see > <https://www.gnu.org/licenses/>. */ > -\f > -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, > -sysdeps/x86_64/chacha20-amd64-avx2.S, and > -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and > -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, > -with the following notices: > - > -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > -This file is part of Libgcrypt. > - > -Libgcrypt is free software; you can redistribute it and/or modify > -it under the terms of the GNU Lesser General Public License as > -published by the Free Software Foundation; either version 2.1 of > -the License, or (at your option) any later version. > - > -Libgcrypt is distributed in the hope that it will be useful, > -but WITHOUT ANY WARRANTY; without even the implied warranty of > -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > -GNU Lesser General Public License for more details. > - > -You should have received a copy of the GNU Lesser General Public > -License along with this program; if not, see <https://www.gnu.org/licenses/>. > diff --git a/include/stdlib.h b/include/stdlib.h > index cae7f7cdf8..db51f4a4f6 100644 > --- a/include/stdlib.h > +++ b/include/stdlib.h > @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; > libc_hidden_proto (__arc4random_uniform); > extern void __arc4random_buf_internal (void *buffer, size_t len) > attribute_hidden; > -/* Called from the fork function to reinitialize the internal cipher state > - in child process. */ > -extern void __arc4random_fork_subprocess (void) attribute_hidden; > > extern double __strtod_internal (const char *__restrict __nptr, > char **__restrict __endptr, int __group) > diff --git a/stdlib/Makefile b/stdlib/Makefile > index a900962685..f7b25c1981 100644 > --- a/stdlib/Makefile > +++ b/stdlib/Makefile > @@ -246,7 +246,6 @@ tests := \ > # tests > > tests-internal := \ > - tst-arc4random-chacha20 \ > tst-strtod1i \ > tst-strtod3 \ > tst-strtod4 \ > @@ -256,7 +255,6 @@ tests-internal := \ > # tests-internal > > tests-static := \ > - tst-arc4random-chacha20 \ > tst-secure-getenv \ > # tests-static > > diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c > index 65547e79aa..80c55cde63 100644 > --- a/stdlib/arc4random.c > +++ b/stdlib/arc4random.c > @@ -1,4 +1,4 @@ > -/* Pseudo Random Number Generator based on ChaCha20. > +/* Pseudo Random Number Generator > Copyright (C) 2022 Free Software Foundation, Inc. > This file is part of the GNU C Library. > > @@ -16,61 +16,14 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <arc4random.h> > #include <errno.h> > #include <not-cancel.h> > #include <stdio.h> > #include <stdlib.h> > +#include <sys/poll.h> > #include <sys/mman.h> > #include <sys/param.h> > #include <sys/random.h> > -#include <tls-internal.h> > - > -/* arc4random keeps two counters: 'have' is the current valid bytes not yet > - consumed in 'buf' while 'count' is the maximum number of bytes until a > - reseed. > - > - Both the initial seed and reseed try to obtain entropy from the kernel > - and abort the process if none could be obtained. > - > - The state 'buf' improves the usage of the cipher calls, allowing to call > - optimized implementations (if the architecture provides it) and minimize > - function call overhead. */ > - > -#include <chacha20.c> > - > -/* Called from the fork function to reset the state. */ > -void > -__arc4random_fork_subprocess (void) > -{ > - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; > - if (state != NULL) > - { > - explicit_bzero (state, sizeof (*state)); > - /* Force key init. */ > - state->count = -1; > - } > -} > - > -/* Return the current thread random state or try to create one if there is > - none available. In the case malloc can not allocate a state, arc4random > - will try to get entropy with arc4random_getentropy. */ > -static struct arc4random_state_t * > -arc4random_get_state (void) > -{ > - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; > - if (state == NULL) > - { > - state = malloc (sizeof (struct arc4random_state_t)); > - if (state != NULL) > - { > - /* Force key initialization on first call. */ > - state->count = -1; > - __glibc_tls_internal ()->rand_state = state; > - } > - } > - return state; > -} > > static void > arc4random_getrandom_failure (void) > @@ -78,106 +31,70 @@ arc4random_getrandom_failure (void) > __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); > } > > -static void > -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) > +void > +__arc4random_buf (void *p, size_t n) > { > - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); > + static bool have_getrandom = true, seen_initialized = false; > + int fd; I think it should reasonable to assume that getrandom syscall will be always supported and using arc4random in an enviroment with filtered getrandom does not make much sense. We are trying to avoid add this static syscall checks where possible, also plain load/store to se the static have_getrandom is strickly a race-condition, although it should not really matter (we use relaxed load/store in such optimization (check sysdeps/unix/sysv/linux/mips/mips64/getdents64.c). Also, does it make sense to fallback if we build for a kernel that should always support getrandom? > > - /* Mix optional user provided data. */ > - if (rnd != NULL) > - { > - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > - for (size_t i = 0; i < m; i++) > - state->buf[i] ^= rnd[i]; > - } > - > - /* Immediately reinit for backtracking resistance. */ > - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); > - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > -} > - > -static void > -arc4random_getentropy (void *rnd, size_t len) > -{ > - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) > + if (n == 0) > return; > > - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", > - O_RDONLY | O_CLOEXEC)); > - if (fd != -1) > + for (;;) > { > - uint8_t *p = rnd; > - uint8_t *end = p + len; > - do > - { > - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); > - if (ret <= 0) > - arc4random_getrandom_failure (); > - p += ret; > - } > - while (p < end); > + ssize_t l; > > - if (__close_nocancel (fd) == 0) > - return; > - } > - arc4random_getrandom_failure (); > -} > + if (!have_getrandom) > + break; > > -/* Check if the thread context STATE should be reseed with kernel entropy > - depending of requested LEN bytes. If there is less than requested, > - the state is either initialized or reseeded, otherwise the internal > - counter subtract the requested length. */ > -static void > -arc4random_check_stir (struct arc4random_state_t *state, size_t len) > -{ > - if (state->count <= len || state->count == -1) > - { > - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; > - arc4random_getentropy (rnd, sizeof rnd); > - > - if (state->count == -1) > - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); > - else > - arc4random_rekey (state, rnd, sizeof rnd); > - > - explicit_bzero (rnd, sizeof rnd); > - > - /* Invalidate the buf. */ > - state->have = 0; > - memset (state->buf, 0, sizeof state->buf); > - state->count = CHACHA20_RESEED_SIZE; > + l = __getrandom_nocancel (p, n, 0); Do we need to worry about a potentially uncancellable blocking call here? I guess using GRND_NONBLOCK does not really help. > + if (l > 0) > + { > + if ((size_t) l == n) Do we need the cast here? > + return; /* Done reading, success. */ Minor style issue: use double space before period. > + p = (uint8_t *) p + l; > + n -= l; > + continue; /* Interrupted by a signal; keep going. */ > + } > + else if (l == 0) > + arc4random_getrandom_failure (); /* Weird, should never happen. */ > + else if (errno == ENOSYS) > + { > + have_getrandom = false; > + break; /* No syscall, so fallback to /dev/urandom. */ > + } > + arc4random_getrandom_failure (); /* Unknown error, should never happen. */ > } > - else > - state->count -= len; > -} > > -void > -__arc4random_buf (void *buffer, size_t len) > -{ > - struct arc4random_state_t *state = arc4random_get_state (); > - if (__glibc_unlikely (state == NULL)) > + if (!seen_initialized) > { > - arc4random_getentropy (buffer, len); > - return; > + struct pollfd pfd = { .events = POLLIN };> + pfd.fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (pfd.fd < 0) > + arc4random_getrandom_failure (); > + if (__poll (&pfd, 1, -1) < 0) > + arc4random_getrandom_failure (); As Florian said we will need a non cancellable poll here. Since you are setting the timeout as undefined, I think it would be simple to just add a non cancellable wrapper as: int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds) { #ifndef __NR_ppoll_time64 # define __NR_ppoll_time64 __NR_ppoll #endif return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0); } So we don't need to handle the timeout for 64-bit time_t wrappers. > + if (__close_nocancel (pfd.fd) < 0) > + arc4random_getrandom_failure (); > + seen_initialized = true; I think we will need to use relaxed atomics, and maybe se the type to int (not sure if atomic wrappers correctly on bool types on all architectures). > } > > - arc4random_check_stir (state, len); > - while (len > 0) > + fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (fd < 0) > + arc4random_getrandom_failure (); > + do > { > - if (state->have > 0) > - { > - size_t m = MIN (len, state->have); > - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; > - memcpy (buffer, ks, m); > - explicit_bzero (ks, m); > - buffer += m; > - len -= m; > - state->have -= m; > - } > - if (state->have == 0) > - arc4random_rekey (state, NULL, 0); > + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); > + if (l <= 0) > + arc4random_getrandom_failure (); > + p = (uint8_t *) p + l; > + n -= l; > } > + while (n); > + if (__close_nocancel (fd) < 0) > + arc4random_getrandom_failure (); > } > libc_hidden_def (__arc4random_buf) > weak_alias (__arc4random_buf, arc4random_buf) > @@ -186,22 +103,7 @@ uint32_t > __arc4random (void) > { > uint32_t r; > - > - struct arc4random_state_t *state = arc4random_get_state (); > - if (__glibc_unlikely (state == NULL)) > - { > - arc4random_getentropy (&r, sizeof (uint32_t)); > - return r; > - } > - > - arc4random_check_stir (state, sizeof (uint32_t)); > - if (state->have < sizeof (uint32_t)) > - arc4random_rekey (state, NULL, 0); > - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; > - memcpy (&r, ks, sizeof (uint32_t)); > - memset (ks, 0, sizeof (uint32_t)); > - state->have -= sizeof (uint32_t); > - > + __arc4random_buf (&r, sizeof (r)); > return r; > } > libc_hidden_def (__arc4random) > diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h > deleted file mode 100644 > index cd39389c19..0000000000 > --- a/stdlib/arc4random.h > +++ /dev/null > @@ -1,48 +0,0 @@ > -/* Arc4random definition used on TLS. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#ifndef _CHACHA20_H > -#define _CHACHA20_H > - > -#include <stddef.h> > -#include <stdint.h> > - > -/* Internal ChaCha20 state. */ > -#define CHACHA20_STATE_LEN 16 > -#define CHACHA20_BLOCK_SIZE 64 > - > -/* Maximum number bytes until reseed (16 MB). */ > -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) > - > -/* Internal arc4random buffer, used on each feedback step so offer some > - backtracking protection and to allow better used of vectorized > - chacha20 implementations. */ > -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) > - > -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); > - > -struct arc4random_state_t > -{ > - uint32_t ctx[CHACHA20_STATE_LEN]; > - size_t have; > - size_t count; > - uint8_t buf[CHACHA20_BUFSIZE]; > -}; > - > -#endif > diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c > deleted file mode 100644 > index 2745a81315..0000000000 > --- a/stdlib/chacha20.c > +++ /dev/null > @@ -1,191 +0,0 @@ > -/* Generic ChaCha20 implementation (used on arc4random). > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <array_length.h> > -#include <endian.h> > -#include <stddef.h> > -#include <stdint.h> > -#include <string.h> > - > -/* 32-bit stream position, then 96-bit nonce. */ > -#define CHACHA20_IV_SIZE 16 > -#define CHACHA20_KEY_SIZE 32 > - > -#define CHACHA20_STATE_LEN 16 > - > -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final > - XOR of the keystream with the plaintext because the plaintext is a > - stream of zeros. */ > - > -enum chacha20_constants > -{ > - CHACHA20_CONSTANT_EXPA = 0x61707865U, > - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, > - CHACHA20_CONSTANT_2_BY = 0x79622d32U, > - CHACHA20_CONSTANT_TE_K = 0x6b206574U > -}; > - > -static inline uint32_t > -read_unaligned_32 (const uint8_t *p) > -{ > - uint32_t r; > - memcpy (&r, p, sizeof (r)); > - return r; > -} > - > -static inline void > -write_unaligned_32 (uint8_t *p, uint32_t v) > -{ > - memcpy (p, &v, sizeof (v)); > -} > - > -#if __BYTE_ORDER == __BIG_ENDIAN > -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) > -# define set_state(v) __builtin_bswap32 ((v)) > -#else > -# define read_unaligned_le32(p) read_unaligned_32 ((p)) > -# define set_state(v) (v) > -#endif > - > -static inline void > -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) > -{ > - state[0] = CHACHA20_CONSTANT_EXPA; > - state[1] = CHACHA20_CONSTANT_ND_3; > - state[2] = CHACHA20_CONSTANT_2_BY; > - state[3] = CHACHA20_CONSTANT_TE_K; > - > - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); > - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); > - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); > - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); > - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); > - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); > - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); > - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); > - > - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); > - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); > - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); > - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); > -} > - > -static inline uint32_t > -rotl32 (unsigned int shift, uint32_t word) > -{ > - return (word << (shift & 31)) | (word >> ((-shift) & 31)); > -} > - > -static void > -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) > -{ > -#ifdef CHACHA20_XOR_FINAL > - v ^= read_unaligned_32 (src); > -#endif > - write_unaligned_32 (dst, v); > -} > - > -static inline void > -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) > -{ > - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; > - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; > - > - x0 = state[0]; > - x1 = state[1]; > - x2 = state[2]; > - x3 = state[3]; > - x4 = state[4]; > - x5 = state[5]; > - x6 = state[6]; > - x7 = state[7]; > - x8 = state[8]; > - x9 = state[9]; > - x10 = state[10]; > - x11 = state[11]; > - x12 = state[12]; > - x13 = state[13]; > - x14 = state[14]; > - x15 = state[15]; > - > - for (int i = 0; i < 20; i += 2) > - { > -#define QROUND(_x0, _x1, _x2, _x3) \ > - do { \ > - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ > - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ > - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ > - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ > - } while(0) > - > - QROUND (x0, x4, x8, x12); > - QROUND (x1, x5, x9, x13); > - QROUND (x2, x6, x10, x14); > - QROUND (x3, x7, x11, x15); > - > - QROUND (x0, x5, x10, x15); > - QROUND (x1, x6, x11, x12); > - QROUND (x2, x7, x8, x13); > - QROUND (x3, x4, x9, x14); > - } > - > - state_final (&src[0], &dst[0], set_state (x0 + state[0])); > - state_final (&src[4], &dst[4], set_state (x1 + state[1])); > - state_final (&src[8], &dst[8], set_state (x2 + state[2])); > - state_final (&src[12], &dst[12], set_state (x3 + state[3])); > - state_final (&src[16], &dst[16], set_state (x4 + state[4])); > - state_final (&src[20], &dst[20], set_state (x5 + state[5])); > - state_final (&src[24], &dst[24], set_state (x6 + state[6])); > - state_final (&src[28], &dst[28], set_state (x7 + state[7])); > - state_final (&src[32], &dst[32], set_state (x8 + state[8])); > - state_final (&src[36], &dst[36], set_state (x9 + state[9])); > - state_final (&src[40], &dst[40], set_state (x10 + state[10])); > - state_final (&src[44], &dst[44], set_state (x11 + state[11])); > - state_final (&src[48], &dst[48], set_state (x12 + state[12])); > - state_final (&src[52], &dst[52], set_state (x13 + state[13])); > - state_final (&src[56], &dst[56], set_state (x14 + state[14])); > - state_final (&src[60], &dst[60], set_state (x15 + state[15])); > - > - state[12]++; > -} > - > -static void > -__attribute_maybe_unused__ > -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - while (bytes >= CHACHA20_BLOCK_SIZE) > - { > - chacha20_block (state, dst, src); > - > - bytes -= CHACHA20_BLOCK_SIZE; > - dst += CHACHA20_BLOCK_SIZE; > - src += CHACHA20_BLOCK_SIZE; > - } > - > - if (__glibc_unlikely (bytes != 0)) > - { > - uint8_t stream[CHACHA20_BLOCK_SIZE]; > - chacha20_block (state, stream, src); > - memcpy (dst, stream, bytes); > - explicit_bzero (stream, sizeof stream); > - } > -} > - > -/* Get the architecture optimized version. */ > -#include <chacha20_arch.h> > diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c > deleted file mode 100644 > index 45ba54920d..0000000000 > --- a/stdlib/tst-arc4random-chacha20.c > +++ /dev/null > @@ -1,167 +0,0 @@ > -/* Basic tests for chacha20 cypher used in arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <arc4random.h> > -#include <support/check.h> > -#include <sys/cdefs.h> > - > -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random > - actual does. */ > -#include <chacha20.c> > - > -static int > -do_test (void) > -{ > - const uint8_t key[CHACHA20_KEY_SIZE] = > - { > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - }; > - const uint8_t iv[CHACHA20_IV_SIZE] = > - { > - 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - }; > - const uint8_t expected1[CHACHA20_BUFSIZE] = > - { > - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, > - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, > - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, > - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, > - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, > - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, > - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, > - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, > - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, > - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, > - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, > - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, > - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, > - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, > - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, > - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, > - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, > - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, > - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, > - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, > - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, > - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, > - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, > - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, > - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, > - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, > - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, > - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, > - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, > - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, > - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, > - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, > - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, > - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, > - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, > - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, > - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, > - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, > - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, > - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, > - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, > - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, > - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, > - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, > - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, > - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, > - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb > - }; > - > - const uint8_t expected2[CHACHA20_BUFSIZE] = > - { > - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, > - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, > - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, > - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, > - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, > - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, > - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, > - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, > - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, > - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, > - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, > - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, > - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, > - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, > - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, > - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, > - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, > - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, > - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, > - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, > - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, > - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, > - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, > - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, > - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, > - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, > - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, > - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, > - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, > - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, > - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, > - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, > - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, > - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, > - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, > - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, > - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, > - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, > - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, > - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, > - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, > - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, > - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, > - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, > - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, > - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, > - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 > - }; > - > - /* Check with the expected internal arc4random keystream buffer. Some > - architecture optimizations expects a buffer with a minimum size which > - is a multiple of then ChaCha20 blocksize, so they might not be prepared > - to handle smaller buffers. */ > - > - uint8_t output[CHACHA20_BUFSIZE]; > - > - uint32_t state[CHACHA20_STATE_LEN]; > - chacha20_init (state, key, iv); > - > - /* Check with the initial state. */ > - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; > - > - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); > - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); > - > - /* And on the next round. */ > - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); > - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); > - > - return 0; > -} > - > -#include <support/test-driver.c> > diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile > index 7dfd1b62dd..17fb1c5b72 100644 > --- a/sysdeps/aarch64/Makefile > +++ b/sysdeps/aarch64/Makefile > @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) > gen-as-const-headers += tlsdesc.sym > endif > > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-aarch64 > -endif > - > ifeq ($(subdir),gmon) > CFLAGS-mcount.c += -mgeneral-regs-only > endif > diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S > deleted file mode 100644 > index cce5291c5c..0000000000 > --- a/sysdeps/aarch64/chacha20-aarch64.S > +++ /dev/null > @@ -1,314 +0,0 @@ > -/* Optimized AArch64 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > - > -/* Only LE is supported. */ > -#ifdef __AARCH64EL__ > - > -#define GET_DATA_POINTER(reg, name) \ > - adrp reg, name ; \ > - add reg, reg, :lo12:name > - > -/* 'ret' instruction replacement for straight-line speculation mitigation */ > -#define ret_spec_stop \ > - ret; dsb sy; isb; > - > -.cpu generic+simd > - > -.text > - > -/* register macros */ > -#define INPUT x0 > -#define DST x1 > -#define SRC x2 > -#define NBLKS x3 > -#define ROUND x4 > -#define INPUT_CTR x5 > -#define INPUT_POS x6 > -#define CTR x7 > - > -/* vector registers */ > -#define X0 v16 > -#define X4 v17 > -#define X8 v18 > -#define X12 v19 > - > -#define X1 v20 > -#define X5 v21 > - > -#define X9 v22 > -#define X13 v23 > -#define X2 v24 > -#define X6 v25 > - > -#define X3 v26 > -#define X7 v27 > -#define X11 v28 > -#define X15 v29 > - > -#define X10 v30 > -#define X14 v31 > - > -#define VCTR v0 > -#define VTMP0 v1 > -#define VTMP1 v2 > -#define VTMP2 v3 > -#define VTMP3 v4 > -#define X12_TMP v5 > -#define X13_TMP v6 > -#define ROT8 v7 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -#define _(...) __VA_ARGS__ > - > -#define vpunpckldq(s1, s2, dst) \ > - zip1 dst.4s, s2.4s, s1.4s; > - > -#define vpunpckhdq(s1, s2, dst) \ > - zip2 dst.4s, s2.4s, s1.4s; > - > -#define vpunpcklqdq(s1, s2, dst) \ > - zip1 dst.2d, s2.2d, s1.2d; > - > -#define vpunpckhqdq(s1, s2, dst) \ > - zip2 dst.2d, s2.2d, s1.2d; > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ > - vpunpckhdq(x1, x0, t2); \ > - vpunpckldq(x1, x0, x0); \ > - \ > - vpunpckldq(x3, x2, t1); \ > - vpunpckhdq(x3, x2, x2); \ > - \ > - vpunpckhqdq(t1, x0, x1); \ > - vpunpcklqdq(t1, x0, x0); \ > - \ > - vpunpckhqdq(x2, t2, x3); \ > - vpunpcklqdq(x2, t2, x2); > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define XOR(d,s1,s2) \ > - eor d.16b, s2.16b, s1.16b; > - > -#define PLUS(ds,s) \ > - add ds.4s, ds.4s, s.4s; > - > -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ > - shl dst1.4s, src1.4s, #(c); \ > - shl dst2.4s, src2.4s, #(c); \ > - shl dst3.4s, src3.4s, #(c); \ > - shl dst4.4s, src4.4s, #(c); \ > - sri dst1.4s, src1.4s, #(32 - (c)); \ > - sri dst2.4s, src2.4s, #(32 - (c)); \ > - sri dst3.4s, src3.4s, #(32 - (c)); \ > - sri dst4.4s, src4.4s, #(32 - (c)); > - > -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ > - tbl dst1.16b, {src1.16b}, ROT8.16b; \ > - tbl dst2.16b, {src2.16b}, ROT8.16b; \ > - tbl dst3.16b, {src3.16b}, ROT8.16b; \ > - tbl dst4.16b, {src4.16b}, ROT8.16b; > - > -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ > - rev32 dst1.8h, src1.8h; \ > - rev32 dst2.8h, src2.8h; \ > - rev32 dst3.8h, src3.8h; \ > - rev32 dst4.8h, src4.8h; > - > -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ > - PLUS(a1,b1); PLUS(a2,b2); \ > - PLUS(a3,b3); PLUS(a4,b4); \ > - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ > - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ > - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ > - PLUS(c1,d1); PLUS(c2,d2); \ > - PLUS(c3,d3); PLUS(c4,d4); \ > - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ > - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ > - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ > - PLUS(a1,b1); PLUS(a2,b2); \ > - PLUS(a3,b3); PLUS(a4,b4); \ > - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ > - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ > - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ > - PLUS(c1,d1); PLUS(c2,d2); \ > - PLUS(c3,d3); PLUS(c4,d4); \ > - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ > - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ > - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ > - > -.align 4 > -L(__chacha20_blocks4_data_inc_counter): > - .long 0,1,2,3 > - > -.align 4 > -L(__chacha20_blocks4_data_rot8): > - .byte 3,0,1,2 > - .byte 7,4,5,6 > - .byte 11,8,9,10 > - .byte 15,12,13,14 > - > -.hidden __chacha20_neon_blocks4 > -ENTRY (__chacha20_neon_blocks4) > - /* input: > - * x0: input > - * x1: dst > - * x2: src > - * x3: nblks (multiple of 4) > - */ > - > - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) > - add INPUT_CTR, INPUT, #(12*4); > - ld1 {ROT8.16b}, [CTR]; > - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) > - mov INPUT_POS, INPUT; > - ld1 {VCTR.16b}, [CTR]; > - > -L(loop4): > - /* Construct counter vectors X12 and X13 */ > - > - ld1 {X15.16b}, [INPUT_CTR]; > - mov ROUND, #20; > - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; > - > - dup X12.4s, X15.s[0]; > - dup X13.4s, X15.s[1]; > - ldr CTR, [INPUT_CTR]; > - add X12.4s, X12.4s, VCTR.4s; > - dup X0.4s, VTMP1.s[0]; > - dup X1.4s, VTMP1.s[1]; > - dup X2.4s, VTMP1.s[2]; > - dup X3.4s, VTMP1.s[3]; > - dup X14.4s, X15.s[2]; > - cmhi VTMP0.4s, VCTR.4s, X12.4s; > - dup X15.4s, X15.s[3]; > - add CTR, CTR, #4; /* Update counter */ > - dup X4.4s, VTMP2.s[0]; > - dup X5.4s, VTMP2.s[1]; > - dup X6.4s, VTMP2.s[2]; > - dup X7.4s, VTMP2.s[3]; > - sub X13.4s, X13.4s, VTMP0.4s; > - dup X8.4s, VTMP3.s[0]; > - dup X9.4s, VTMP3.s[1]; > - dup X10.4s, VTMP3.s[2]; > - dup X11.4s, VTMP3.s[3]; > - mov X12_TMP.16b, X12.16b; > - mov X13_TMP.16b, X13.16b; > - str CTR, [INPUT_CTR]; > - > -L(round2): > - subs ROUND, ROUND, #2 > - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, > - X2, X6, X10, X14, X3, X7, X11, X15, > - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) > - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, > - X2, X7, X8, X13, X3, X4, X9, X14, > - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) > - b.ne L(round2); > - > - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; > - > - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ > - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ > - > - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ > - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ > - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ > - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ > - PLUS(X0, VTMP2); > - PLUS(X1, VTMP3); > - PLUS(X2, X12_TMP); > - PLUS(X3, X13_TMP); > - > - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ > - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ > - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ > - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ > - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; > - mov INPUT_POS, INPUT; > - PLUS(X4, VTMP2); > - PLUS(X5, VTMP3); > - PLUS(X6, X12_TMP); > - PLUS(X7, X13_TMP); > - > - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ > - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ > - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ > - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ > - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ > - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ > - PLUS(X8, VTMP2); > - PLUS(X9, VTMP3); > - PLUS(X10, X12_TMP); > - PLUS(X11, X13_TMP); > - PLUS(X14, VTMP0); > - PLUS(X15, VTMP1); > - > - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); > - > - subs NBLKS, NBLKS, #4; > - > - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 > - st1 {X1.16b,X5.16b}, [DST], #32; > - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 > - st1 {X10.16b,X14.16b}, [DST], #32; > - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; > - > - b.ne L(loop4); > - > - ret_spec_stop > -END (__chacha20_neon_blocks4) > - > -#endif > diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h > deleted file mode 100644 > index 37dbb917f1..0000000000 > --- a/sysdeps/aarch64/chacha20_arch.h > +++ /dev/null > @@ -1,40 +0,0 @@ > -/* Chacha20 implementation, used on arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <ldsodefs.h> > -#include <stdbool.h> > - > -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); > -#ifdef __AARCH64EL__ > - __chacha20_neon_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#else > - chacha20_crypt_generic (state, dst, src, bytes); > -#endif > -} > diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h > deleted file mode 100644 > index 1b4559ccbc..0000000000 > --- a/sysdeps/generic/chacha20_arch.h > +++ /dev/null > @@ -1,24 +0,0 @@ > -/* Chacha20 implementation, generic interface for encrypt. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c > index 8a0f37d509..b32b31b5a9 100644 > --- a/sysdeps/generic/tls-internal.c > +++ b/sysdeps/generic/tls-internal.c > @@ -16,7 +16,6 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <stdlib/arc4random.h> > #include <string.h> > #include <tls-internal.h> > > @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) > { > free (__tls_internal.strsignal_buf); > free (__tls_internal.strerror_l_buf); > - > - if (__tls_internal.rand_state != NULL) > - { > - /* Clear any lingering random state prior so if the thread stack is > - cached it won't leak any data. */ > - explicit_bzero (__tls_internal.rand_state, > - sizeof (*__tls_internal.rand_state)); > - free (__tls_internal.rand_state); > - } > } > diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c > index 667068c8cf..e60b86fab1 100644 > --- a/sysdeps/mach/hurd/_Fork.c > +++ b/sysdeps/mach/hurd/_Fork.c > @@ -662,8 +662,6 @@ retry: > _hurd_malloc_fork_child (); > call_function_static_weak (__malloc_fork_unlock_child); > > - call_function_static_weak (__arc4random_fork_subprocess); > - > /* Run things that want to run in the child task to set up. */ > RUN_HOOK (_hurd_fork_child_hook, ()); > > diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c > index 7dc02569f6..dd568992e2 100644 > --- a/sysdeps/nptl/_Fork.c > +++ b/sysdeps/nptl/_Fork.c > @@ -43,8 +43,6 @@ _Fork (void) > self->robust_head.list = &self->robust_head; > INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, > sizeof (struct robust_list_head)); > - > - call_function_static_weak (__arc4random_fork_subprocess); > } > return pid; > } > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > deleted file mode 100644 > index 8c75165f7f..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > +++ /dev/null > @@ -1,4 +0,0 @@ > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-ppc > -CFLAGS-chacha20-ppc.c += -mcpu=power8 > -endif > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > deleted file mode 100644 > index cf9e735326..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > +++ /dev/null > @@ -1 +0,0 @@ > -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > deleted file mode 100644 > index 08494dc045..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > +++ /dev/null > @@ -1,42 +0,0 @@ > -/* PowerPC optimization for ChaCha20. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > - > -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > - > - unsigned long int hwcap = GLRO(dl_hwcap); > - unsigned long int hwcap2 = GLRO(dl_hwcap2); > - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) > - __chacha20_power8_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - else > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile > index abb0aa3f11..71a59529f3 100644 > --- a/sysdeps/powerpc/powerpc64/power8/Makefile > +++ b/sysdeps/powerpc/powerpc64/power8/Makefile > @@ -1,8 +1,3 @@ > ifeq ($(subdir),string) > sysdep_routines += strcasestr-ppc64 > endif > - > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-ppc > -CFLAGS-chacha20-ppc.c += -mcpu=power8 > -endif > diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > deleted file mode 100644 > index 0bbdcb9363..0000000000 > --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > +++ /dev/null > @@ -1,256 +0,0 @@ > -/* Optimized PowerPC implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 > - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -#include <altivec.h> > -#include <endian.h> > -#include <stddef.h> > -#include <stdint.h> > -#include <sys/cdefs.h> > - > -typedef vector unsigned char vector16x_u8; > -typedef vector unsigned int vector4x_u32; > -typedef vector unsigned long long vector2x_u64; > - > -#if __BYTE_ORDER == __BIG_ENDIAN > -static const vector16x_u8 le_bswap_const = > - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; > -#endif > - > -static inline vector4x_u32 > -vec_rol_elems (vector4x_u32 v, unsigned int idx) > -{ > -#if __BYTE_ORDER != __BIG_ENDIAN > - return vec_sld (v, v, (16 - (4 * idx)) & 15); > -#else > - return vec_sld (v, v, (4 * idx) & 15); > -#endif > -} > - > -static inline vector4x_u32 > -vec_load_le (unsigned long offset, const unsigned char *ptr) > -{ > - vector4x_u32 vec; > - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); > -#if __BYTE_ORDER == __BIG_ENDIAN > - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, > - le_bswap_const); > -#endif > - return vec; > -} > - > -static inline void > -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) > -{ > -#if __BYTE_ORDER == __BIG_ENDIAN > - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, > - le_bswap_const); > -#endif > - vec_vsx_st (vec, offset, (uint32_t *)ptr); > -} > - > - > -static inline vector4x_u32 > -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) > -{ > -#if __BYTE_ORDER == __BIG_ENDIAN > - static const vector16x_u8 swap32 = > - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; > - vector2x_u64 vec, add, sum; > - > - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); > - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); > - sum = vec + add; > - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); > -#else > - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); > -#endif > -} > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define ROTATE(v1,rolv) \ > - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) > - > -#define PLUS(ds,s) \ > - ((ds) += (s)) > - > -#define XOR(ds,s) \ > - ((ds) ^= (s)) > - > -#define ADD_U64(v,a) \ > - (v = vec_add_ctr_u64(v, a)) > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0, x1, x2, x3) ({ \ > - vector4x_u32 t1 = vec_mergeh(x0, x2); \ > - vector4x_u32 t2 = vec_mergel(x0, x2); \ > - vector4x_u32 t3 = vec_mergeh(x1, x3); \ > - x3 = vec_mergel(x1, x3); \ > - x0 = vec_mergeh(t1, t3); \ > - x1 = vec_mergel(t1, t3); \ > - x2 = vec_mergeh(t2, x3); \ > - x3 = vec_mergel(t2, x3); \ > - }) > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); > - > -unsigned int attribute_hidden > -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t nblks) > -{ > - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; > - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; > - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; > - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; > - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; > - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; > - vector4x_u32 state0, state1, state2, state3; > - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; > - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; > - vector4x_u32 tmp; > - int i; > - > - /* Force preload of constants to vector registers. */ > - __asm__ ("": "+v" (counters_0123) :: "memory"); > - __asm__ ("": "+v" (counter_4) :: "memory"); > - __asm__ ("": "+v" (rotate_16) :: "memory"); > - __asm__ ("": "+v" (rotate_12) :: "memory"); > - __asm__ ("": "+v" (rotate_8) :: "memory"); > - __asm__ ("": "+v" (rotate_7) :: "memory"); > - > - state0 = vec_vsx_ld (0 * 16, state); > - state1 = vec_vsx_ld (1 * 16, state); > - state2 = vec_vsx_ld (2 * 16, state); > - state3 = vec_vsx_ld (3 * 16, state); > - > - do > - { > - v0 = vec_splat (state0, 0); > - v1 = vec_splat (state0, 1); > - v2 = vec_splat (state0, 2); > - v3 = vec_splat (state0, 3); > - v4 = vec_splat (state1, 0); > - v5 = vec_splat (state1, 1); > - v6 = vec_splat (state1, 2); > - v7 = vec_splat (state1, 3); > - v8 = vec_splat (state2, 0); > - v9 = vec_splat (state2, 1); > - v10 = vec_splat (state2, 2); > - v11 = vec_splat (state2, 3); > - v12 = vec_splat (state3, 0); > - v13 = vec_splat (state3, 1); > - v14 = vec_splat (state3, 2); > - v15 = vec_splat (state3, 3); > - > - v12 += counters_0123; > - v13 -= vec_cmplt (v12, counters_0123); > - > - for (i = 20; i > 0; i -= 2) > - { > - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) > - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) > - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) > - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) > - } > - > - v0 += vec_splat (state0, 0); > - v1 += vec_splat (state0, 1); > - v2 += vec_splat (state0, 2); > - v3 += vec_splat (state0, 3); > - v4 += vec_splat (state1, 0); > - v5 += vec_splat (state1, 1); > - v6 += vec_splat (state1, 2); > - v7 += vec_splat (state1, 3); > - v8 += vec_splat (state2, 0); > - v9 += vec_splat (state2, 1); > - v10 += vec_splat (state2, 2); > - v11 += vec_splat (state2, 3); > - tmp = vec_splat( state3, 0); > - tmp += counters_0123; > - v12 += tmp; > - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); > - v14 += vec_splat (state3, 2); > - v15 += vec_splat (state3, 3); > - ADD_U64 (state3, counter_4); > - > - transpose_4x4 (v0, v1, v2, v3); > - transpose_4x4 (v4, v5, v6, v7); > - transpose_4x4 (v8, v9, v10, v11); > - transpose_4x4 (v12, v13, v14, v15); > - > - vec_store_le (v0, (64 * 0 + 16 * 0), dst); > - vec_store_le (v1, (64 * 1 + 16 * 0), dst); > - vec_store_le (v2, (64 * 2 + 16 * 0), dst); > - vec_store_le (v3, (64 * 3 + 16 * 0), dst); > - > - vec_store_le (v4, (64 * 0 + 16 * 1), dst); > - vec_store_le (v5, (64 * 1 + 16 * 1), dst); > - vec_store_le (v6, (64 * 2 + 16 * 1), dst); > - vec_store_le (v7, (64 * 3 + 16 * 1), dst); > - > - vec_store_le (v8, (64 * 0 + 16 * 2), dst); > - vec_store_le (v9, (64 * 1 + 16 * 2), dst); > - vec_store_le (v10, (64 * 2 + 16 * 2), dst); > - vec_store_le (v11, (64 * 3 + 16 * 2), dst); > - > - vec_store_le (v12, (64 * 0 + 16 * 3), dst); > - vec_store_le (v13, (64 * 1 + 16 * 3), dst); > - vec_store_le (v14, (64 * 2 + 16 * 3), dst); > - vec_store_le (v15, (64 * 3 + 16 * 3), dst); > - > - src += 4*64; > - dst += 4*64; > - > - nblks -= 4; > - } > - while (nblks); > - > - vec_vsx_st (state3, 3 * 16, state); > - > - return 0; > -} > diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > deleted file mode 100644 > index ded06762b6..0000000000 > --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > +++ /dev/null > @@ -1,37 +0,0 @@ > -/* PowerPC optimization for ChaCha20. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > - > -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > - > - __chacha20_power8_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -} > diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile > index 96c110f490..66ed844e68 100644 > --- a/sysdeps/s390/s390-64/Makefile > +++ b/sysdeps/s390/s390-64/Makefile > @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache > endif > > endif # $(subdir) == elf > - > -ifeq ($(subdir),stdlib) > -sysdep_routines += \ > - chacha20-s390x \ > - # sysdep_routines > -endif > diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S > deleted file mode 100644 > index e38504d370..0000000000 > --- a/sysdeps/s390/s390-64/chacha20-s390x.S > +++ /dev/null > @@ -1,573 +0,0 @@ > -/* Optimized s390x implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher > - > - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -#include <sysdep.h> > - > -#ifdef HAVE_S390_VX_ASM_SUPPORT > - > -/* CFA expressions are used for pointing CFA and registers to > - * SP relative offsets. */ > -# define DW_REGNO_SP 15 > - > -/* Fixed length encoding used for integers for now. */ > -# define DW_SLEB128_7BIT(value) \ > - 0x00|((value) & 0x7f) > -# define DW_SLEB128_28BIT(value) \ > - 0x80|((value)&0x7f), \ > - 0x80|(((value)>>7)&0x7f), \ > - 0x80|(((value)>>14)&0x7f), \ > - 0x00|(((value)>>21)&0x7f) > - > -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ > - .cfi_escape \ > - 0x0f, /* DW_CFA_def_cfa_expression */ \ > - DW_SLEB128_7BIT(11), /* length */ \ > - 0x7f, /* DW_OP_breg15, rsp + constant */ \ > - DW_SLEB128_28BIT(rsp_offs), \ > - 0x06, /* DW_OP_deref */ \ > - 0x23, /* DW_OP_plus_constu */ \ > - DW_SLEB128_28BIT((cfa_depth)+160) > - > -.machine "z13+vx" > -.text > - > -.balign 16 > -.Lconsts: > -.Lwordswap: > - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 > -.Lbswap128: > - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 > -.Lbswap32: > - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 > -.Lone: > - .long 0, 0, 0, 1 > -.Ladd_counter_0123: > - .long 0, 1, 2, 3 > -.Ladd_counter_4567: > - .long 4, 5, 6, 7 > - > -/* register macros */ > -#define INPUT %r2 > -#define DST %r3 > -#define SRC %r4 > -#define NBLKS %r0 > -#define ROUND %r1 > - > -/* stack structure */ > - > -#define STACK_FRAME_STD (8 * 16 + 8 * 4) > -#define STACK_FRAME_F8_F15 (8 * 8) > -#define STACK_FRAME_Y0_Y15 (16 * 16) > -#define STACK_FRAME_CTR (4 * 16) > -#define STACK_FRAME_PARAMS (6 * 8) > - > -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ > - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ > - STACK_FRAME_PARAMS) > - > -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) > -#define STACK_F9 (STACK_F8 + 8) > -#define STACK_F10 (STACK_F9 + 8) > -#define STACK_F11 (STACK_F10 + 8) > -#define STACK_F12 (STACK_F11 + 8) > -#define STACK_F13 (STACK_F12 + 8) > -#define STACK_F14 (STACK_F13 + 8) > -#define STACK_F15 (STACK_F14 + 8) > -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) > -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) > -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) > -#define STACK_DST (STACK_INPUT + 8) > -#define STACK_SRC (STACK_DST + 8) > -#define STACK_NBLKS (STACK_SRC + 8) > -#define STACK_POCTX (STACK_NBLKS + 8) > -#define STACK_POSRC (STACK_POCTX + 8) > - > -#define STACK_G0_H3 STACK_Y0_Y15 > - > -/* vector registers */ > -#define A0 %v0 > -#define A1 %v1 > -#define A2 %v2 > -#define A3 %v3 > - > -#define B0 %v4 > -#define B1 %v5 > -#define B2 %v6 > -#define B3 %v7 > - > -#define C0 %v8 > -#define C1 %v9 > -#define C2 %v10 > -#define C3 %v11 > - > -#define D0 %v12 > -#define D1 %v13 > -#define D2 %v14 > -#define D3 %v15 > - > -#define E0 %v16 > -#define E1 %v17 > -#define E2 %v18 > -#define E3 %v19 > - > -#define F0 %v20 > -#define F1 %v21 > -#define F2 %v22 > -#define F3 %v23 > - > -#define G0 %v24 > -#define G1 %v25 > -#define G2 %v26 > -#define G3 %v27 > - > -#define H0 %v28 > -#define H1 %v29 > -#define H2 %v30 > -#define H3 %v31 > - > -#define IO0 E0 > -#define IO1 E1 > -#define IO2 E2 > -#define IO3 E3 > -#define IO4 F0 > -#define IO5 F1 > -#define IO6 F2 > -#define IO7 F3 > - > -#define S0 G0 > -#define S1 G1 > -#define S2 G2 > -#define S3 G3 > - > -#define TMP0 H0 > -#define TMP1 H1 > -#define TMP2 H2 > -#define TMP3 H3 > - > -#define X0 A0 > -#define X1 A1 > -#define X2 A2 > -#define X3 A3 > -#define X4 B0 > -#define X5 B1 > -#define X6 B2 > -#define X7 B3 > -#define X8 C0 > -#define X9 C1 > -#define X10 C2 > -#define X11 C3 > -#define X12 D0 > -#define X13 D1 > -#define X14 D2 > -#define X15 D3 > - > -#define Y0 E0 > -#define Y1 E1 > -#define Y2 E2 > -#define Y3 E3 > -#define Y4 F0 > -#define Y5 F1 > -#define Y6 F2 > -#define Y7 F3 > -#define Y8 G0 > -#define Y9 G1 > -#define Y10 G2 > -#define Y11 G3 > -#define Y12 H0 > -#define Y13 H1 > -#define Y14 H2 > -#define Y15 H3 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -#define _ /*_*/ > - > -#define START_STACK(last_r) \ > - lgr %r0, %r15; \ > - lghi %r1, ~15; \ > - stmg %r6, last_r, 6 * 8(%r15); \ > - aghi %r0, -STACK_MAX; \ > - ngr %r0, %r1; \ > - lgr %r1, %r15; \ > - cfi_def_cfa_register(1); \ > - lgr %r15, %r0; \ > - stg %r1, 0(%r15); \ > - cfi_cfa_on_stack(0, 0); \ > - std %f8, STACK_F8(%r15); \ > - std %f9, STACK_F9(%r15); \ > - std %f10, STACK_F10(%r15); \ > - std %f11, STACK_F11(%r15); \ > - std %f12, STACK_F12(%r15); \ > - std %f13, STACK_F13(%r15); \ > - std %f14, STACK_F14(%r15); \ > - std %f15, STACK_F15(%r15); > - > -#define END_STACK(last_r) \ > - lg %r1, 0(%r15); \ > - ld %f8, STACK_F8(%r15); \ > - ld %f9, STACK_F9(%r15); \ > - ld %f10, STACK_F10(%r15); \ > - ld %f11, STACK_F11(%r15); \ > - ld %f12, STACK_F12(%r15); \ > - ld %f13, STACK_F13(%r15); \ > - ld %f14, STACK_F14(%r15); \ > - ld %f15, STACK_F15(%r15); \ > - lmg %r6, last_r, 6 * 8(%r1); \ > - lgr %r15, %r1; \ > - cfi_def_cfa_register(DW_REGNO_SP); > - > -#define PLUS(dst,src) \ > - vaf dst, dst, src; > - > -#define XOR(dst,src) \ > - vx dst, dst, src; > - > -#define ROTATE(v1,c) \ > - verllf v1, v1, (c)(0); > - > -#define WORD_ROTATE(v1,s) \ > - vsldb v1, v1, v1, ((s) * 4); > - > -#define DST_8(OPER, I, J) \ > - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ > - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); > - > -/********************************************************************** > - round macros > - **********************************************************************/ > - > -/********************************************************************** > - 8-way chacha20 ("vertical") > - **********************************************************************/ > - > -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ > - x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,\ > - y8,y9,y10,y11,y12,y13,y14,y15,\ > - op1,op2,op3,op4,op5,op6,op7,op8,\ > - op9,op10,op11,op12) \ > - op1; \ > - PLUS(x0, x1); PLUS(x4, x5); \ > - PLUS(x8, x9); PLUS(x12, x13); \ > - PLUS(y0, y1); PLUS(y4, y5); \ > - PLUS(y8, y9); PLUS(y12, y13); \ > - op2; \ > - XOR(x3, x0); XOR(x7, x4); \ > - XOR(x11, x8); XOR(x15, x12); \ > - XOR(y3, y0); XOR(y7, y4); \ > - XOR(y11, y8); XOR(y15, y12); \ > - op3; \ > - ROTATE(x3, 16); ROTATE(x7, 16); \ > - ROTATE(x11, 16); ROTATE(x15, 16); \ > - ROTATE(y3, 16); ROTATE(y7, 16); \ > - ROTATE(y11, 16); ROTATE(y15, 16); \ > - op4; \ > - PLUS(x2, x3); PLUS(x6, x7); \ > - PLUS(x10, x11); PLUS(x14, x15); \ > - PLUS(y2, y3); PLUS(y6, y7); \ > - PLUS(y10, y11); PLUS(y14, y15); \ > - op5; \ > - XOR(x1, x2); XOR(x5, x6); \ > - XOR(x9, x10); XOR(x13, x14); \ > - XOR(y1, y2); XOR(y5, y6); \ > - XOR(y9, y10); XOR(y13, y14); \ > - op6; \ > - ROTATE(x1,12); ROTATE(x5,12); \ > - ROTATE(x9,12); ROTATE(x13,12); \ > - ROTATE(y1,12); ROTATE(y5,12); \ > - ROTATE(y9,12); ROTATE(y13,12); \ > - op7; \ > - PLUS(x0, x1); PLUS(x4, x5); \ > - PLUS(x8, x9); PLUS(x12, x13); \ > - PLUS(y0, y1); PLUS(y4, y5); \ > - PLUS(y8, y9); PLUS(y12, y13); \ > - op8; \ > - XOR(x3, x0); XOR(x7, x4); \ > - XOR(x11, x8); XOR(x15, x12); \ > - XOR(y3, y0); XOR(y7, y4); \ > - XOR(y11, y8); XOR(y15, y12); \ > - op9; \ > - ROTATE(x3,8); ROTATE(x7,8); \ > - ROTATE(x11,8); ROTATE(x15,8); \ > - ROTATE(y3,8); ROTATE(y7,8); \ > - ROTATE(y11,8); ROTATE(y15,8); \ > - op10; \ > - PLUS(x2, x3); PLUS(x6, x7); \ > - PLUS(x10, x11); PLUS(x14, x15); \ > - PLUS(y2, y3); PLUS(y6, y7); \ > - PLUS(y10, y11); PLUS(y14, y15); \ > - op11; \ > - XOR(x1, x2); XOR(x5, x6); \ > - XOR(x9, x10); XOR(x13, x14); \ > - XOR(y1, y2); XOR(y5, y6); \ > - XOR(y9, y10); XOR(y13, y14); \ > - op12; \ > - ROTATE(x1,7); ROTATE(x5,7); \ > - ROTATE(x9,7); ROTATE(x13,7); \ > - ROTATE(y1,7); ROTATE(y5,7); \ > - ROTATE(y9,7); ROTATE(y13,7); > - > -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ > - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ > - x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,\ > - y8,y9,y10,y11,y12,y13,y14,y15,\ > - ,,,,,,,,,,,) > - > -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ > - vmrhf tmp0, v0, v1; \ > - vmrhf tmp1, v2, v3; \ > - vmrlf tmp2, v0, v1; \ > - vmrlf v3, v2, v3; \ > - vmrhf tmpa, va, vb; \ > - vmrhf tmpb, vc, vd; \ > - vmrlf tmpc, va, vb; \ > - vmrlf vd, vc, vd; \ > - vpdi v0, tmp0, tmp1, 0; \ > - vpdi v1, tmp0, tmp1, 5; \ > - vpdi v2, tmp2, v3, 0; \ > - vpdi v3, tmp2, v3, 5; \ > - vpdi va, tmpa, tmpb, 0; \ > - vpdi vb, tmpa, tmpb, 5; \ > - vpdi vc, tmpc, vd, 0; \ > - vpdi vd, tmpc, vd, 5; > - > -.balign 8 > -.globl __chacha20_s390x_vx_blocks8 > -ENTRY (__chacha20_s390x_vx_blocks8) > - /* input: > - * %r2: input > - * %r3: dst > - * %r4: src > - * %r5: nblks (multiple of 8) > - */ > - > - START_STACK(%r8); > - lgr NBLKS, %r5; > - > - larl %r7, .Lconsts; > - > - /* Load counter. */ > - lg %r8, (12 * 4)(INPUT); > - rllg %r8, %r8, 32; > - > -.balign 4 > - /* Process eight chacha20 blocks per loop. */ > -.Lloop8: > - vlm Y0, Y3, 0(INPUT); > - > - slgfi NBLKS, 8; > - lghi ROUND, (20 / 2); > - > - /* Construct counter vectors X12/X13 & Y12/Y13. */ > - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); > - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); > - vrepf Y12, Y3, 0; > - vrepf Y13, Y3, 1; > - vaccf X5, Y12, X4; > - vaccf Y5, Y12, Y4; > - vaf X12, Y12, X4; > - vaf Y12, Y12, Y4; > - vaf X13, Y13, X5; > - vaf Y13, Y13, Y5; > - > - vrepf X0, Y0, 0; > - vrepf X1, Y0, 1; > - vrepf X2, Y0, 2; > - vrepf X3, Y0, 3; > - vrepf X4, Y1, 0; > - vrepf X5, Y1, 1; > - vrepf X6, Y1, 2; > - vrepf X7, Y1, 3; > - vrepf X8, Y2, 0; > - vrepf X9, Y2, 1; > - vrepf X10, Y2, 2; > - vrepf X11, Y2, 3; > - vrepf X14, Y3, 2; > - vrepf X15, Y3, 3; > - > - /* Store counters for blocks 0-7. */ > - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); > - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); > - > - vlr Y0, X0; > - vlr Y1, X1; > - vlr Y2, X2; > - vlr Y3, X3; > - vlr Y4, X4; > - vlr Y5, X5; > - vlr Y6, X6; > - vlr Y7, X7; > - vlr Y8, X8; > - vlr Y9, X9; > - vlr Y10, X10; > - vlr Y11, X11; > - vlr Y14, X14; > - vlr Y15, X15; > - > - /* Update and store counter. */ > - agfi %r8, 8; > - rllg %r5, %r8, 32; > - stg %r5, (12 * 4)(INPUT); > - > -.balign 4 > -.Lround2_8: > - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, > - X2, X6, X10, X14, X3, X7, X11, X15, > - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, > - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); > - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, > - X2, X7, X8, X13, X3, X4, X9, X14, > - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, > - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); > - brctg ROUND, .Lround2_8; > - > - /* Store blocks 4-7. */ > - vstm Y0, Y15, STACK_Y0_Y15(%r15); > - > - /* Load counters for blocks 0-3. */ > - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); > - > - lghi ROUND, 1; > - j .Lfirst_output_4blks_8; > - > -.balign 4 > -.Lsecond_output_4blks_8: > - /* Load blocks 4-7. */ > - vlm X0, X15, STACK_Y0_Y15(%r15); > - > - /* Load counters for blocks 4-7. */ > - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); > - > - lghi ROUND, 0; > - > -.balign 4 > - /* Output four chacha20 blocks per loop. */ > -.Lfirst_output_4blks_8: > - vlm Y12, Y15, 0(INPUT); > - PLUS(X12, Y0); > - PLUS(X13, Y1); > - vrepf Y0, Y12, 0; > - vrepf Y1, Y12, 1; > - vrepf Y2, Y12, 2; > - vrepf Y3, Y12, 3; > - vrepf Y4, Y13, 0; > - vrepf Y5, Y13, 1; > - vrepf Y6, Y13, 2; > - vrepf Y7, Y13, 3; > - vrepf Y8, Y14, 0; > - vrepf Y9, Y14, 1; > - vrepf Y10, Y14, 2; > - vrepf Y11, Y14, 3; > - vrepf Y14, Y15, 2; > - vrepf Y15, Y15, 3; > - PLUS(X0, Y0); > - PLUS(X1, Y1); > - PLUS(X2, Y2); > - PLUS(X3, Y3); > - PLUS(X4, Y4); > - PLUS(X5, Y5); > - PLUS(X6, Y6); > - PLUS(X7, Y7); > - PLUS(X8, Y8); > - PLUS(X9, Y9); > - PLUS(X10, Y10); > - PLUS(X11, Y11); > - PLUS(X14, Y14); > - PLUS(X15, Y15); > - > - vl Y15, (.Lbswap32 - .Lconsts)(%r7); > - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, > - Y9, Y10, Y11, Y12, Y13, Y14); > - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, > - Y9, Y10, Y11, Y12, Y13, Y14); > - > - vlm Y0, Y14, 0(SRC); > - vperm X0, X0, X0, Y15; > - vperm X1, X1, X1, Y15; > - vperm X2, X2, X2, Y15; > - vperm X3, X3, X3, Y15; > - vperm X4, X4, X4, Y15; > - vperm X5, X5, X5, Y15; > - vperm X6, X6, X6, Y15; > - vperm X7, X7, X7, Y15; > - vperm X8, X8, X8, Y15; > - vperm X9, X9, X9, Y15; > - vperm X10, X10, X10, Y15; > - vperm X11, X11, X11, Y15; > - vperm X12, X12, X12, Y15; > - vperm X13, X13, X13, Y15; > - vperm X14, X14, X14, Y15; > - vperm X15, X15, X15, Y15; > - vl Y15, (15 * 16)(SRC); > - > - XOR(Y0, X0); > - XOR(Y1, X4); > - XOR(Y2, X8); > - XOR(Y3, X12); > - XOR(Y4, X1); > - XOR(Y5, X5); > - XOR(Y6, X9); > - XOR(Y7, X13); > - XOR(Y8, X2); > - XOR(Y9, X6); > - XOR(Y10, X10); > - XOR(Y11, X14); > - XOR(Y12, X3); > - XOR(Y13, X7); > - XOR(Y14, X11); > - XOR(Y15, X15); > - vstm Y0, Y15, 0(DST); > - > - aghi SRC, 256; > - aghi DST, 256; > - > - clgije ROUND, 1, .Lsecond_output_4blks_8; > - > - clgijhe NBLKS, 8, .Lloop8; > - > - > - END_STACK(%r8); > - xgr %r2, %r2; > - br %r14; > -END (__chacha20_s390x_vx_blocks8) > - > -#endif /* HAVE_S390_VX_ASM_SUPPORT */ > diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h > deleted file mode 100644 > index 0c6abf77e8..0000000000 > --- a/sysdeps/s390/s390-64/chacha20_arch.h > +++ /dev/null > @@ -1,45 +0,0 @@ > -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > -#include <sys/auxv.h> > - > -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > -#ifdef HAVE_S390_VX_ASM_SUPPORT > - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, > - "CHACHA20_BUFSIZE not multiple of 8"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); > - > - if (GLRO(dl_hwcap) & HWCAP_S390_VX) > - { > - __chacha20_s390x_vx_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - return; > - } > -#endif > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c > index 0326ebb767..c8a9ed2d40 100644 > --- a/sysdeps/unix/sysv/linux/tls-internal.c > +++ b/sysdeps/unix/sysv/linux/tls-internal.c > @@ -16,7 +16,6 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <stdlib/arc4random.h> > #include <string.h> > #include <tls-internal.h> > > @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) > struct pthread *self = THREAD_SELF; > free (self->tls_state.strsignal_buf); > free (self->tls_state.strerror_l_buf); > - > - if (self->tls_state.rand_state != NULL) > - { > - /* Clear any lingering random state prior so if the thread stack is > - cached it won't leak any data. */ > - explicit_bzero (self->tls_state.rand_state, > - sizeof (*self->tls_state.rand_state)); > - free (self->tls_state.rand_state); > - } > } > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile > index 1178475d75..c19bef2dec 100644 > --- a/sysdeps/x86_64/Makefile > +++ b/sysdeps/x86_64/Makefile > @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) > gen-as-const-headers += link-defines.sym > endif > > -ifeq ($(subdir),stdlib) > -sysdep_routines += \ > - chacha20-amd64-sse2 \ > - chacha20-amd64-avx2 \ > - # sysdep_routines > -endif > - > ifeq ($(subdir),gmon) > sysdep_routines += _mcount > # We cannot compile _mcount.S with -pg because that would create > diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S > deleted file mode 100644 > index aefd1cdbd0..0000000000 > --- a/sysdeps/x86_64/chacha20-amd64-avx2.S > +++ /dev/null > @@ -1,328 +0,0 @@ > -/* Optimized AVX2 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher > - > - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > -*/ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > - > -#ifdef PIC > -# define rRIP (%rip) > -#else > -# define rRIP > -#endif > - > -/* register macros */ > -#define INPUT %rdi > -#define DST %rsi > -#define SRC %rdx > -#define NBLKS %rcx > -#define ROUND %eax > - > -/* stack structure */ > -#define STACK_VEC_X12 (32) > -#define STACK_VEC_X13 (32 + STACK_VEC_X12) > -#define STACK_TMP (32 + STACK_VEC_X13) > -#define STACK_TMP1 (32 + STACK_TMP) > - > -#define STACK_MAX (32 + STACK_TMP1) > - > -/* vector registers */ > -#define X0 %ymm0 > -#define X1 %ymm1 > -#define X2 %ymm2 > -#define X3 %ymm3 > -#define X4 %ymm4 > -#define X5 %ymm5 > -#define X6 %ymm6 > -#define X7 %ymm7 > -#define X8 %ymm8 > -#define X9 %ymm9 > -#define X10 %ymm10 > -#define X11 %ymm11 > -#define X12 %ymm12 > -#define X13 %ymm13 > -#define X14 %ymm14 > -#define X15 %ymm15 > - > -#define X0h %xmm0 > -#define X1h %xmm1 > -#define X2h %xmm2 > -#define X3h %xmm3 > -#define X4h %xmm4 > -#define X5h %xmm5 > -#define X6h %xmm6 > -#define X7h %xmm7 > -#define X8h %xmm8 > -#define X9h %xmm9 > -#define X10h %xmm10 > -#define X11h %xmm11 > -#define X12h %xmm12 > -#define X13h %xmm13 > -#define X14h %xmm14 > -#define X15h %xmm15 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ > - vpunpckhdq x1, x0, t2; \ > - vpunpckldq x1, x0, x0; \ > - \ > - vpunpckldq x3, x2, t1; \ > - vpunpckhdq x3, x2, x2; \ > - \ > - vpunpckhqdq t1, x0, x1; \ > - vpunpcklqdq t1, x0, x0; \ > - \ > - vpunpckhqdq x2, t2, x3; \ > - vpunpcklqdq x2, t2, x2; > - > -/* 2x2 128-bit matrix transpose */ > -#define transpose_16byte_2x2(x0,x1,t1) \ > - vmovdqa x0, t1; \ > - vperm2i128 $0x20, x1, x0, x0; \ > - vperm2i128 $0x31, x1, t1, x1; > - > -/********************************************************************** > - 8-way chacha20 > - **********************************************************************/ > - > -#define ROTATE2(v1,v2,c,tmp) \ > - vpsrld $(32 - (c)), v1, tmp; \ > - vpslld $(c), v1, v1; \ > - vpaddb tmp, v1, v1; \ > - vpsrld $(32 - (c)), v2, tmp; \ > - vpslld $(c), v2, v2; \ > - vpaddb tmp, v2, v2; > - > -#define ROTATE_SHUF_2(v1,v2,shuf) \ > - vpshufb shuf, v1, v1; \ > - vpshufb shuf, v2, v2; > - > -#define XOR(ds,s) \ > - vpxor s, ds, ds; > - > -#define PLUS(ds,s) \ > - vpaddd s, ds, ds; > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ > - interleave_op1,interleave_op2,\ > - interleave_op3,interleave_op4) \ > - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ > - interleave_op1; \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE_SHUF_2(d1, d2, tmp1); \ > - interleave_op2; \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 12, tmp1); \ > - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ > - interleave_op3; \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE_SHUF_2(d1, d2, tmp1); \ > - interleave_op4; \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 7, tmp1); > - > - .section .text.avx2, "ax", @progbits > - .align 32 > -chacha20_data: > -L(shuf_rol16): > - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 > -L(shuf_rol8): > - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 > -L(inc_counter): > - .byte 0,1,2,3,4,5,6,7 > -L(unsigned_cmp): > - .long 0x80000000 > - > - .hidden __chacha20_avx2_blocks8 > -ENTRY (__chacha20_avx2_blocks8) > - /* input: > - * %rdi: input > - * %rsi: dst > - * %rdx: src > - * %rcx: nblks (multiple of 8) > - */ > - vzeroupper; > - > - pushq %rbp; > - cfi_adjust_cfa_offset(8); > - cfi_rel_offset(rbp, 0) > - movq %rsp, %rbp; > - cfi_def_cfa_register(rbp); > - > - subq $STACK_MAX, %rsp; > - andq $~31, %rsp; > - > -L(loop8): > - mov $20, ROUND; > - > - /* Construct counter vectors X12 and X13 */ > - vpmovzxbd L(inc_counter) rRIP, X0; > - vpbroadcastd L(unsigned_cmp) rRIP, X2; > - vpbroadcastd (12 * 4)(INPUT), X12; > - vpbroadcastd (13 * 4)(INPUT), X13; > - vpaddd X0, X12, X12; > - vpxor X2, X0, X0; > - vpxor X2, X12, X1; > - vpcmpgtd X1, X0, X0; > - vpsubd X0, X13, X13; > - vmovdqa X12, (STACK_VEC_X12)(%rsp); > - vmovdqa X13, (STACK_VEC_X13)(%rsp); > - > - /* Load vectors */ > - vpbroadcastd (0 * 4)(INPUT), X0; > - vpbroadcastd (1 * 4)(INPUT), X1; > - vpbroadcastd (2 * 4)(INPUT), X2; > - vpbroadcastd (3 * 4)(INPUT), X3; > - vpbroadcastd (4 * 4)(INPUT), X4; > - vpbroadcastd (5 * 4)(INPUT), X5; > - vpbroadcastd (6 * 4)(INPUT), X6; > - vpbroadcastd (7 * 4)(INPUT), X7; > - vpbroadcastd (8 * 4)(INPUT), X8; > - vpbroadcastd (9 * 4)(INPUT), X9; > - vpbroadcastd (10 * 4)(INPUT), X10; > - vpbroadcastd (11 * 4)(INPUT), X11; > - vpbroadcastd (14 * 4)(INPUT), X14; > - vpbroadcastd (15 * 4)(INPUT), X15; > - vmovdqa X15, (STACK_TMP)(%rsp); > - > -L(round2): > - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) > - vmovdqa (STACK_TMP)(%rsp), X15; > - vmovdqa X8, (STACK_TMP)(%rsp); > - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) > - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) > - vmovdqa (STACK_TMP)(%rsp), X8; > - vmovdqa X15, (STACK_TMP)(%rsp); > - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) > - sub $2, ROUND; > - jnz L(round2); > - > - vmovdqa X8, (STACK_TMP1)(%rsp); > - > - /* tmp := X15 */ > - vpbroadcastd (0 * 4)(INPUT), X15; > - PLUS(X0, X15); > - vpbroadcastd (1 * 4)(INPUT), X15; > - PLUS(X1, X15); > - vpbroadcastd (2 * 4)(INPUT), X15; > - PLUS(X2, X15); > - vpbroadcastd (3 * 4)(INPUT), X15; > - PLUS(X3, X15); > - vpbroadcastd (4 * 4)(INPUT), X15; > - PLUS(X4, X15); > - vpbroadcastd (5 * 4)(INPUT), X15; > - PLUS(X5, X15); > - vpbroadcastd (6 * 4)(INPUT), X15; > - PLUS(X6, X15); > - vpbroadcastd (7 * 4)(INPUT), X15; > - PLUS(X7, X15); > - transpose_4x4(X0, X1, X2, X3, X8, X15); > - transpose_4x4(X4, X5, X6, X7, X8, X15); > - vmovdqa (STACK_TMP1)(%rsp), X8; > - transpose_16byte_2x2(X0, X4, X15); > - transpose_16byte_2x2(X1, X5, X15); > - transpose_16byte_2x2(X2, X6, X15); > - transpose_16byte_2x2(X3, X7, X15); > - vmovdqa (STACK_TMP)(%rsp), X15; > - vmovdqu X0, (64 * 0 + 16 * 0)(DST) > - vmovdqu X1, (64 * 1 + 16 * 0)(DST) > - vpbroadcastd (8 * 4)(INPUT), X0; > - PLUS(X8, X0); > - vpbroadcastd (9 * 4)(INPUT), X0; > - PLUS(X9, X0); > - vpbroadcastd (10 * 4)(INPUT), X0; > - PLUS(X10, X0); > - vpbroadcastd (11 * 4)(INPUT), X0; > - PLUS(X11, X0); > - vmovdqa (STACK_VEC_X12)(%rsp), X0; > - PLUS(X12, X0); > - vmovdqa (STACK_VEC_X13)(%rsp), X0; > - PLUS(X13, X0); > - vpbroadcastd (14 * 4)(INPUT), X0; > - PLUS(X14, X0); > - vpbroadcastd (15 * 4)(INPUT), X0; > - PLUS(X15, X0); > - vmovdqu X2, (64 * 2 + 16 * 0)(DST) > - vmovdqu X3, (64 * 3 + 16 * 0)(DST) > - > - /* Update counter */ > - addq $8, (12 * 4)(INPUT); > - > - transpose_4x4(X8, X9, X10, X11, X0, X1); > - transpose_4x4(X12, X13, X14, X15, X0, X1); > - vmovdqu X4, (64 * 4 + 16 * 0)(DST) > - vmovdqu X5, (64 * 5 + 16 * 0)(DST) > - transpose_16byte_2x2(X8, X12, X0); > - transpose_16byte_2x2(X9, X13, X0); > - transpose_16byte_2x2(X10, X14, X0); > - transpose_16byte_2x2(X11, X15, X0); > - vmovdqu X6, (64 * 6 + 16 * 0)(DST) > - vmovdqu X7, (64 * 7 + 16 * 0)(DST) > - vmovdqu X8, (64 * 0 + 16 * 2)(DST) > - vmovdqu X9, (64 * 1 + 16 * 2)(DST) > - vmovdqu X10, (64 * 2 + 16 * 2)(DST) > - vmovdqu X11, (64 * 3 + 16 * 2)(DST) > - vmovdqu X12, (64 * 4 + 16 * 2)(DST) > - vmovdqu X13, (64 * 5 + 16 * 2)(DST) > - vmovdqu X14, (64 * 6 + 16 * 2)(DST) > - vmovdqu X15, (64 * 7 + 16 * 2)(DST) > - > - sub $8, NBLKS; > - lea (8 * 64)(DST), DST; > - lea (8 * 64)(SRC), SRC; > - jnz L(loop8); > - > - vzeroupper; > - > - /* eax zeroed by round loop. */ > - leave; > - cfi_adjust_cfa_offset(-8) > - cfi_def_cfa_register(%rsp); > - ret; > - int3; > -END(__chacha20_avx2_blocks8) > diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S > deleted file mode 100644 > index 351a1109c6..0000000000 > --- a/sysdeps/x86_64/chacha20-amd64-sse2.S > +++ /dev/null > @@ -1,311 +0,0 @@ > -/* Optimized SSE2 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher > - > - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > -*/ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > -#include <isa-level.h> > - > -#if MINIMUM_X86_ISA_LEVEL <= 2 > - > -#ifdef PIC > -# define rRIP (%rip) > -#else > -# define rRIP > -#endif > - > -/* 'ret' instruction replacement for straight-line speculation mitigation */ > -#define ret_spec_stop \ > - ret; int3; > - > -/* register macros */ > -#define INPUT %rdi > -#define DST %rsi > -#define SRC %rdx > -#define NBLKS %rcx > -#define ROUND %eax > - > -/* stack structure */ > -#define STACK_VEC_X12 (16) > -#define STACK_VEC_X13 (16 + STACK_VEC_X12) > -#define STACK_TMP (16 + STACK_VEC_X13) > -#define STACK_TMP1 (16 + STACK_TMP) > -#define STACK_TMP2 (16 + STACK_TMP1) > - > -#define STACK_MAX (16 + STACK_TMP2) > - > -/* vector registers */ > -#define X0 %xmm0 > -#define X1 %xmm1 > -#define X2 %xmm2 > -#define X3 %xmm3 > -#define X4 %xmm4 > -#define X5 %xmm5 > -#define X6 %xmm6 > -#define X7 %xmm7 > -#define X8 %xmm8 > -#define X9 %xmm9 > -#define X10 %xmm10 > -#define X11 %xmm11 > -#define X12 %xmm12 > -#define X13 %xmm13 > -#define X14 %xmm14 > -#define X15 %xmm15 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -/* 4x4 32-bit integer matrix transpose */ > -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ > - movdqa x0, t2; \ > - punpckhdq x1, t2; \ > - punpckldq x1, x0; \ > - \ > - movdqa x2, t1; \ > - punpckldq x3, t1; \ > - punpckhdq x3, x2; \ > - \ > - movdqa x0, x1; \ > - punpckhqdq t1, x1; \ > - punpcklqdq t1, x0; \ > - \ > - movdqa t2, x3; \ > - punpckhqdq x2, x3; \ > - punpcklqdq x2, t2; \ > - movdqa t2, x2; > - > -/* fill xmm register with 32-bit value from memory */ > -#define PBROADCASTD(mem32, xreg) \ > - movd mem32, xreg; \ > - pshufd $0, xreg, xreg; > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ > - movdqa v1, tmp1; \ > - movdqa v2, tmp2; \ > - psrld $(32 - (c)), v1; \ > - pslld $(c), tmp1; \ > - paddb tmp1, v1; \ > - psrld $(32 - (c)), v2; \ > - pslld $(c), tmp2; \ > - paddb tmp2, v2; > - > -#define XOR(ds,s) \ > - pxor s, ds; > - > -#define PLUS(ds,s) \ > - paddd s, ds; > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE2(d1, d2, 16, tmp1, tmp2); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 12, tmp1, tmp2); \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE2(d1, d2, 8, tmp1, tmp2); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 7, tmp1, tmp2); > - > - .section .text.sse2,"ax",@progbits > - > -chacha20_data: > - .align 16 > -L(counter1): > - .long 1,0,0,0 > -L(inc_counter): > - .long 0,1,2,3 > -L(unsigned_cmp): > - .long 0x80000000,0x80000000,0x80000000,0x80000000 > - > - .hidden __chacha20_sse2_blocks4 > -ENTRY (__chacha20_sse2_blocks4) > - /* input: > - * %rdi: input > - * %rsi: dst > - * %rdx: src > - * %rcx: nblks (multiple of 4) > - */ > - > - pushq %rbp; > - cfi_adjust_cfa_offset(8); > - cfi_rel_offset(rbp, 0) > - movq %rsp, %rbp; > - cfi_def_cfa_register(%rbp); > - > - subq $STACK_MAX, %rsp; > - andq $~15, %rsp; > - > -L(loop4): > - mov $20, ROUND; > - > - /* Construct counter vectors X12 and X13 */ > - movdqa L(inc_counter) rRIP, X0; > - movdqa L(unsigned_cmp) rRIP, X2; > - PBROADCASTD((12 * 4)(INPUT), X12); > - PBROADCASTD((13 * 4)(INPUT), X13); > - paddd X0, X12; > - movdqa X12, X1; > - pxor X2, X0; > - pxor X2, X1; > - pcmpgtd X1, X0; > - psubd X0, X13; > - movdqa X12, (STACK_VEC_X12)(%rsp); > - movdqa X13, (STACK_VEC_X13)(%rsp); > - > - /* Load vectors */ > - PBROADCASTD((0 * 4)(INPUT), X0); > - PBROADCASTD((1 * 4)(INPUT), X1); > - PBROADCASTD((2 * 4)(INPUT), X2); > - PBROADCASTD((3 * 4)(INPUT), X3); > - PBROADCASTD((4 * 4)(INPUT), X4); > - PBROADCASTD((5 * 4)(INPUT), X5); > - PBROADCASTD((6 * 4)(INPUT), X6); > - PBROADCASTD((7 * 4)(INPUT), X7); > - PBROADCASTD((8 * 4)(INPUT), X8); > - PBROADCASTD((9 * 4)(INPUT), X9); > - PBROADCASTD((10 * 4)(INPUT), X10); > - PBROADCASTD((11 * 4)(INPUT), X11); > - PBROADCASTD((14 * 4)(INPUT), X14); > - PBROADCASTD((15 * 4)(INPUT), X15); > - movdqa X11, (STACK_TMP)(%rsp); > - movdqa X15, (STACK_TMP1)(%rsp); > - > -L(round2_4): > - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) > - movdqa (STACK_TMP)(%rsp), X11; > - movdqa (STACK_TMP1)(%rsp), X15; > - movdqa X8, (STACK_TMP)(%rsp); > - movdqa X9, (STACK_TMP1)(%rsp); > - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) > - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) > - movdqa (STACK_TMP)(%rsp), X8; > - movdqa (STACK_TMP1)(%rsp), X9; > - movdqa X11, (STACK_TMP)(%rsp); > - movdqa X15, (STACK_TMP1)(%rsp); > - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) > - sub $2, ROUND; > - jnz L(round2_4); > - > - /* tmp := X15 */ > - movdqa (STACK_TMP)(%rsp), X11; > - PBROADCASTD((0 * 4)(INPUT), X15); > - PLUS(X0, X15); > - PBROADCASTD((1 * 4)(INPUT), X15); > - PLUS(X1, X15); > - PBROADCASTD((2 * 4)(INPUT), X15); > - PLUS(X2, X15); > - PBROADCASTD((3 * 4)(INPUT), X15); > - PLUS(X3, X15); > - PBROADCASTD((4 * 4)(INPUT), X15); > - PLUS(X4, X15); > - PBROADCASTD((5 * 4)(INPUT), X15); > - PLUS(X5, X15); > - PBROADCASTD((6 * 4)(INPUT), X15); > - PLUS(X6, X15); > - PBROADCASTD((7 * 4)(INPUT), X15); > - PLUS(X7, X15); > - PBROADCASTD((8 * 4)(INPUT), X15); > - PLUS(X8, X15); > - PBROADCASTD((9 * 4)(INPUT), X15); > - PLUS(X9, X15); > - PBROADCASTD((10 * 4)(INPUT), X15); > - PLUS(X10, X15); > - PBROADCASTD((11 * 4)(INPUT), X15); > - PLUS(X11, X15); > - movdqa (STACK_VEC_X12)(%rsp), X15; > - PLUS(X12, X15); > - movdqa (STACK_VEC_X13)(%rsp), X15; > - PLUS(X13, X15); > - movdqa X13, (STACK_TMP)(%rsp); > - PBROADCASTD((14 * 4)(INPUT), X15); > - PLUS(X14, X15); > - movdqa (STACK_TMP1)(%rsp), X15; > - movdqa X14, (STACK_TMP1)(%rsp); > - PBROADCASTD((15 * 4)(INPUT), X13); > - PLUS(X15, X13); > - movdqa X15, (STACK_TMP2)(%rsp); > - > - /* Update counter */ > - addq $4, (12 * 4)(INPUT); > - > - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); > - movdqu X0, (64 * 0 + 16 * 0)(DST) > - movdqu X1, (64 * 1 + 16 * 0)(DST) > - movdqu X2, (64 * 2 + 16 * 0)(DST) > - movdqu X3, (64 * 3 + 16 * 0)(DST) > - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); > - movdqa (STACK_TMP)(%rsp), X13; > - movdqa (STACK_TMP1)(%rsp), X14; > - movdqa (STACK_TMP2)(%rsp), X15; > - movdqu X4, (64 * 0 + 16 * 1)(DST) > - movdqu X5, (64 * 1 + 16 * 1)(DST) > - movdqu X6, (64 * 2 + 16 * 1)(DST) > - movdqu X7, (64 * 3 + 16 * 1)(DST) > - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); > - movdqu X8, (64 * 0 + 16 * 2)(DST) > - movdqu X9, (64 * 1 + 16 * 2)(DST) > - movdqu X10, (64 * 2 + 16 * 2)(DST) > - movdqu X11, (64 * 3 + 16 * 2)(DST) > - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); > - movdqu X12, (64 * 0 + 16 * 3)(DST) > - movdqu X13, (64 * 1 + 16 * 3)(DST) > - movdqu X14, (64 * 2 + 16 * 3)(DST) > - movdqu X15, (64 * 3 + 16 * 3)(DST) > - > - sub $4, NBLKS; > - lea (4 * 64)(DST), DST; > - lea (4 * 64)(SRC), SRC; > - jnz L(loop4); > - > - /* eax zeroed by round loop. */ > - leave; > - cfi_adjust_cfa_offset(-8) > - cfi_def_cfa_register(%rsp); > - ret_spec_stop; > -END (__chacha20_sse2_blocks4) > - > -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ > diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h > deleted file mode 100644 > index 6f3784e392..0000000000 > --- a/sysdeps/x86_64/chacha20_arch.h > +++ /dev/null > @@ -1,55 +0,0 @@ > -/* Chacha20 implementation, used on arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <isa-level.h> > -#include <ldsodefs.h> > -#include <cpu-features.h> > -#include <sys/param.h> > - > -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, > - "CHACHA20_BUFSIZE not multiple of 4 or 8"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); > - > -#if MINIMUM_X86_ISA_LEVEL > 2 > - __chacha20_avx2_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#else > - const struct cpu_features* cpu_features = __get_cpu_features (); > - > - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ > - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) > - __chacha20_avx2_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - else > - __chacha20_sse2_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#endif > -} ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:33 ` Adhemerval Zanella Netto @ 2022-07-26 11:54 ` Jason A. Donenfeld 2022-07-26 12:08 ` Jason A. Donenfeld ` (2 more replies) 0 siblings, 3 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 11:54 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Hi Adhemerval, Thanks for your review. On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote: > Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment, > sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build), > maybe also change the NEWS to state this is not a CSPRNG, and we definitely need > to update the manual. Some comments below. I think Eric already pointed those out, and they're fixed in v3 now. PTAL. > > + static bool have_getrandom = true, seen_initialized = false; > > + int fd; > > I think it should reasonable to assume that getrandom syscall will be always > supported and using arc4random in an enviroment with filtered getrandom does > not make much sense. We are trying to avoid add this static syscall checks > where possible, I don't know glibc's requirements for kernels, though I do know that it'd be nice to not have to write this fallback code in every program I write and just use libc's thing. So in that sense, having the fallback to /dev/urandom makes arc4random_buf a lot more useful. But with that said, yea, maybe we shouldn't care about old kernels? getrandom is now quite old and the stable kernels on kernel.org all have it. From my perspective, I don't have a strongly developed opinion on what makes sense for glibc. If Florian agrees with you, I'll send a v+1 with the fallback code removed. If it's contentious, maybe the fallback code should stay in and we can slate it for removal on another day, when the minimum glibc kernel version gets raised or something like that. > also plain load/store to se the static have_getrandom > is strickly a race-condition, although it should not really matter (we use > relaxed load/store in such optimization (check > sysdeps/unix/sysv/linux/mips/mips64/getdents64.c). I was aware of the race but figured it didn't matter, since two racing threads will both set it to the same result eventually. But I didn't know about the convention of using those relaxed wrapper functions. Thanks for the tip. I'll do that for v4. > Also, does it make sense to fallback if we build for a kernel that should > always support getrandom? I guess only if syscall filtering is a concern. But if not, then maybe yea? We could do this in a follow-up commit, or I could do this in v4. Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here? If so, I think the way I'd implement that would be: diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 978bf9287f..a33d9ff2c5 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n) { ssize_t l; +#if __LINUX_KERNEL_VERSION < something if (!atomic_load_relaxed (&have_getrandom)) break; +#endif l = __getrandom_nocancel (p, n, 0); if (l > 0) @@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n) arc4random_getrandom_failure (); /* Weird, should never happen. */ else if (l == -EINTR) continue; /* Interrupted by a signal; keep going. */ +#if __LINUX_KERNEL_VERSION < something else if (l == -ENOSYS) { atomic_store_relaxed (&have_getrandom, false); break; /* No syscall, so fallback to /dev/urandom. */ } +#endif arc4random_getrandom_failure (); /* Unknown error, should never happen. */ } And then arc4random_getrandom_failure() being a noreturn function would make gcc optimize out the rest. Does that seem like a good approach? > > + l = __getrandom_nocancel (p, n, 0); > > Do we need to worry about a potentially uncancellable blocking call here? I guess > using GRND_NONBLOCK does not really help. No, generally not. Also, keep in mind that getrandom(0) will trigger jitter entropy if the kernel isn't already initialized. > > > + if (l > 0) > > + { > > + if ((size_t) l == n) > > Do we need the cast here? Generally it's frowned upon to have implicit signed conversion, right? l is signed while n is unsigned. > > > + return; /* Done reading, success. */ > > Minor style issue: use double space before period. I was really confused by this, and then opened up some other files and saw you meant *after* period. :) Will do for v4. > As Florian said we will need a non cancellable poll here. Since you are setting > the timeout as undefined, I think it would be simple to just add a non cancellable > wrapper as: > > int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds) > { > #ifndef __NR_ppoll_time64 > # define __NR_ppoll_time64 __NR_ppoll > #endif > return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0); > } > > So we don't need to handle the timeout for 64-bit time_t wrappers. Oh that sounds like a good solution to the time64 situation. I'll do that for v4... BUT, I already implemented possibly the wrong solution for v3. Could you take a look at what I did there and confirm that it's wrong? If so, then I'll do exactly what you suggested here. Thanks again for the review, Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:54 ` Jason A. Donenfeld @ 2022-07-26 12:08 ` Jason A. Donenfeld 2022-07-26 12:20 ` Jason A. Donenfeld 2022-07-26 12:34 ` Adhemerval Zanella Netto 2 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 12:08 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Hey again, On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote: > > As Florian said we will need a non cancellable poll here. Since you are setting > > the timeout as undefined, I think it would be simple to just add a non cancellable > > wrapper as: > > > > int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds) > > { > > #ifndef __NR_ppoll_time64 > > # define __NR_ppoll_time64 __NR_ppoll > > #endif > > return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0); > > } > > > > So we don't need to handle the timeout for 64-bit time_t wrappers. > > Oh that sounds like a good solution to the time64 situation. I'll do > that for v4... BUT, I already implemented possibly the wrong solution > for v3. Could you take a look at what I did there and confirm that it's > wrong? If so, then I'll do exactly what you suggested here. Actually, forget my v3. What you're suggesting is also better because it's ppoll, not poll, as poll isn't on all platforms. So I'll do things exactly as you've described for v4. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:54 ` Jason A. Donenfeld 2022-07-26 12:08 ` Jason A. Donenfeld @ 2022-07-26 12:20 ` Jason A. Donenfeld 2022-07-26 12:34 ` Adhemerval Zanella Netto 2 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 12:20 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto On Tue, Jul 26, 2022 at 01:54:23PM +0200, Jason A. Donenfeld wrote: > > Also, does it make sense to fallback if we build for a kernel that should > > always support getrandom? > > I guess only if syscall filtering is a concern. But if not, then maybe > yea? We could do this in a follow-up commit, or I could do this in v4. > Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here? > If so, I think the way I'd implement that would be: > > [...] > > And then arc4random_getrandom_failure() being a noreturn function would > make gcc optimize out the rest. > > Does that seem like a good approach? It actually winds up looking a bit more like the below. Let me know if you want that in v4. diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index c0f132ea9b..8fcf41e7de 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -43,7 +43,7 @@ __arc4random_buf (void *p, size_t n) { ssize_t l; - if (!atomic_load_relaxed (&have_getrandom)) + if (!__ASSUME_GETRANDOM && !atomic_load_relaxed (&have_getrandom)) break; l = __getrandom_nocancel (p, n, 0); @@ -59,7 +59,7 @@ __arc4random_buf (void *p, size_t n) arc4random_getrandom_failure (); /* Weird, should never happen. */ else if (l == -EINTR) continue; /* Interrupted by a signal; keep going. */ - else if (l == -ENOSYS) + else if (!__ASSUME_GETRANDOM && l == -ENOSYS) { atomic_store_relaxed (&have_getrandom, false); break; /* No syscall, so fallback to /dev/urandom. */ diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h index 74adc3956b..75d5f953d4 100644 --- a/sysdeps/unix/sysv/linux/kernel-features.h +++ b/sysdeps/unix/sysv/linux/kernel-features.h @@ -236,4 +236,11 @@ # define __ASSUME_FUTEX_LOCK_PI2 0 #endif +/* The getrandom() syscall was added in 3.17. */ +#if __LINUX_KERNEL_VERSION >= 0x031100 +# define __ASSUME_GETRANDOM 1 +#else +# define __ASSUME_GETRANDOM 0 +#endif + #endif /* kernel-features.h */ ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 11:54 ` Jason A. Donenfeld 2022-07-26 12:08 ` Jason A. Donenfeld 2022-07-26 12:20 ` Jason A. Donenfeld @ 2022-07-26 12:34 ` Adhemerval Zanella Netto 2022-07-26 12:47 ` Jason A. Donenfeld 2 siblings, 1 reply; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 12:34 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto On 26/07/22 08:54, Jason A. Donenfeld wrote: > Hi Adhemerval, > > Thanks for your review. > > On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote: >> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment, >> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build), >> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need >> to update the manual. Some comments below. > > I think Eric already pointed those out, and they're fixed in v3 now. > PTAL. > >>> + static bool have_getrandom = true, seen_initialized = false; >>> + int fd; >> >> I think it should reasonable to assume that getrandom syscall will be always >> supported and using arc4random in an enviroment with filtered getrandom does >> not make much sense. We are trying to avoid add this static syscall checks >> where possible, > > I don't know glibc's requirements for kernels, though I do know that > it'd be nice to not have to write this fallback code in every program I > write and just use libc's thing. So in that sense, having the fallback > to /dev/urandom makes arc4random_buf a lot more useful. But with that > said, yea, maybe we shouldn't care about old kernels? getrandom is now > quite old and the stable kernels on kernel.org all have it. We do not enforce kernels version anymore, although we still support the --enable-kernel=x.y that changes on how glibc internally assume some syscall (so there is no need to fallback if it were the case). So the question is where we need the fallback code for --enable-kernel=3.17. If kernel is returning ENOSYS in this case (and assuming you are running on kernel newer than 3.17) it means some syscall filtering, and I am not sure we should need to actually handle it. The main idea of adding this minor optimization is to once we increase the minimum supported kernel we can clean this code up. > > From my perspective, I don't have a strongly developed opinion on what > makes sense for glibc. If Florian agrees with you, I'll send a v+1 with > the fallback code removed. If it's contentious, maybe the fallback code > should stay in and we can slate it for removal on another day, when the > minimum glibc kernel version gets raised or something like that. I think the fallback code make sense since the minimum supported kernel we still support is 3.2, although I am not sure how getrandom and/or /dev/urandom will play in such older kernels. > >> also plain load/store to se the static have_getrandom >> is strickly a race-condition, although it should not really matter (we use >> relaxed load/store in such optimization (check >> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c). > > I was aware of the race but figured it didn't matter, since two racing > threads will both set it to the same result eventually. But I didn't > know about the convention of using those relaxed wrapper functions. > Thanks for the tip. I'll do that for v4. > >> Also, does it make sense to fallback if we build for a kernel that should >> always support getrandom? > > I guess only if syscall filtering is a concern. But if not, then maybe > yea? We could do this in a follow-up commit, or I could do this in v4. > Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here? > If so, I think the way I'd implement that would be: > > diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c > index 978bf9287f..a33d9ff2c5 100644 > --- a/stdlib/arc4random.c > +++ b/stdlib/arc4random.c > @@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n) > { > ssize_t l; > > +#if __LINUX_KERNEL_VERSION < something > if (!atomic_load_relaxed (&have_getrandom)) > break; > +#endif> > l = __getrandom_nocancel (p, n, 0); > if (l > 0) > @@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n) > arc4random_getrandom_failure (); /* Weird, should never happen. */ > else if (l == -EINTR) > continue; /* Interrupted by a signal; keep going. */ > +#if __LINUX_KERNEL_VERSION < something > else if (l == -ENOSYS) > { > atomic_store_relaxed (&have_getrandom, false); > break; /* No syscall, so fallback to /dev/urandom. */ > } > +#endif > arc4random_getrandom_failure (); /* Unknown error, should never happen. */ > } > > And then arc4random_getrandom_failure() being a noreturn function would > make gcc optimize out the rest. > > Does that seem like a good approach? I think so, although he __LINUX_KERNEL_VERSION is Linux-only that should be moved to sysdeps/unix/sysv/linux. Usually we do as a wrapper (static inline or hidden symbol), with the generic implementation on sysdep/generic or include with Linux redefining on its own folder. We also a use __ASSUME macros (check sysdeps/unix/sysv/linux/kernel-features.h), it should be something like __ASSUME_GETRANDOM (we did not have a use for it because we do not want a fallback for getrandom implementation). So I would add something like: sysdeps/unix/sysv/linux/arc4random_impl.h static inline int getentropy_arch (void *p, size_t n) { for (;;) { ssize_t l = __getrandom_nocancel (p, n, 0); if (l > 0) { if (l == n) return true; } else if (l == 0) return -1; else if (l == -EINTR) continue; #if !__ASSUME_GETRANDOM if (l == -ENOSYS) return 0; #endif return -1; } return 1; } And on stdlib/arc4random.c: void __arc4random_buf (void *p, size_t n) { if (n == 0) return; int s = getentropy_arch (p, n); if (s > 0) return; if (s < 0) arc4random_getrandom_failure () /* Fallback. */ } > >>> + l = __getrandom_nocancel (p, n, 0); >> >> Do we need to worry about a potentially uncancellable blocking call here? I guess >> using GRND_NONBLOCK does not really help. > > No, generally not. Also, keep in mind that getrandom(0) will trigger > jitter entropy if the kernel isn't already initialized. Maybe add a comment stating it. > >> >>> + if (l > 0) >>> + { >>> + if ((size_t) l == n) >> >> Do we need the cast here? > > Generally it's frowned upon to have implicit signed conversion, right? l > is signed while n is unsigned. Good question, I don't think we enforce it in fact. > >> >>> + return; /* Done reading, success. */ >> >> Minor style issue: use double space before period. > > I was really confused by this, and then opened up some other files and > saw you meant *after* period. :) Will do for v4. Yeah, I meant after indeed. > >> As Florian said we will need a non cancellable poll here. Since you are setting >> the timeout as undefined, I think it would be simple to just add a non cancellable >> wrapper as: >> >> int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds) >> { >> #ifndef __NR_ppoll_time64 >> # define __NR_ppoll_time64 __NR_ppoll >> #endif >> return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0); >> } >> >> So we don't need to handle the timeout for 64-bit time_t wrappers. > > Oh that sounds like a good solution to the time64 situation. I'll do > that for v4... BUT, I already implemented possibly the wrong solution > for v3. Could you take a look at what I did there and confirm that it's > wrong? If so, then I'll do exactly what you suggested here. > > Thanks again for the review, > Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 12:34 ` Adhemerval Zanella Netto @ 2022-07-26 12:47 ` Jason A. Donenfeld 2022-07-26 13:11 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 12:47 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto Hi Adhemerval, On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote: > kernel newer than 3.17) it means some syscall filtering, and I am not sure > we should need to actually handle it. One thing to keep in mind is that people who use CUSE-based /dev/urandom implementations might not like this, as it means they'd also have to intercept getrandom() rather than just ENOSYS'ing it. But maybe that's fine. I don't know of anyone actually doing this in the real world at the moment. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v2] arc4random: simplify design for better safety 2022-07-26 12:47 ` Jason A. Donenfeld @ 2022-07-26 13:11 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 13:11 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, linux-crypto On 26/07/22 09:47, Jason A. Donenfeld wrote: > Hi Adhemerval, > > On Tue, Jul 26, 2022 at 09:34:57AM -0300, Adhemerval Zanella Netto wrote: >> kernel newer than 3.17) it means some syscall filtering, and I am not sure >> we should need to actually handle it. > > One thing to keep in mind is that people who use CUSE-based /dev/urandom > implementations might not like this, as it means they'd also have to > intercept getrandom() rather than just ENOSYS'ing it. But maybe that's > fine. I don't know of anyone actually doing this in the real world at > the moment. > I think it is a fair assumption that if you trying to implement your own character device in userland, we should know the implications for the environment. From glibc standpoint, and I would for this whole thread, we should assume that getrandom is de-facto API for entropy. ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v4] arc4random: simplify design for better safety 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld 2022-07-25 23:11 ` Jason A. Donenfeld 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld @ 2022-07-26 13:30 ` Jason A. Donenfeld 2022-07-26 15:21 ` Yann Droneaud ` (2 more replies) 2 siblings, 3 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 13:30 UTC (permalink / raw) To: libc-alpha Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, but it does to a decent swath of them, so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - NEWS | 4 +- include/stdlib.h | 3 - manual/math.texi | 13 +- stdlib/Makefile | 2 - stdlib/arc4random.c | 205 ++----- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/tls-internal-struct.h | 1 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/mach/hurd/kernel-features.h | 1 + sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/Makefile | 3 +- sysdeps/unix/sysv/linux/Versions | 1 + sysdeps/unix/sysv/linux/kernel-features.h | 7 + sysdeps/unix/sysv/linux/not-cancel.h | 6 + .../sysv/linux/ppoll_nocancel.c} | 19 +- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/unix/sysv/linux/tls-internal.h | 1 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 37 files changed, 89 insertions(+), 2658 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/ppoll_nocancel.c} (62%) delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/NEWS b/NEWS index 8420a65cd0..fe531bfe1e 100644 --- a/NEWS +++ b/NEWS @@ -61,8 +61,8 @@ Major new features: is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). * The functions arc4random, arc4random_buf, and arc4random_uniform have been - added. The functions use a pseudo-random number generator along with - entropy from the kernel. + added. The functions wrap getrandom and/or /dev/urandom to return high- + quality randomness from the kernel. Deprecated and removed features, and other changes affecting compatibility: diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/manual/math.texi b/manual/math.texi index 141695cc30..6d69bbff66 100644 --- a/manual/math.texi +++ b/manual/math.texi @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU extension, based on OpenBSD interfaces. @Theglibc{} uses kernel entropy obtained either through @code{getrandom} -or by reading @file{/dev/urandom} to seed and periodically re-seed the -internal state. A per-thread data pool is used, which allows fast output -generation. +or by reading @file{/dev/urandom} to seed. -Although these functions provide higher random quality than ISO, BSD, and -SVID functions, these still use a Pseudo-Random generator and should not -be used in cryptographic contexts. - -The internal state is cleared and reseeded with kernel entropy on @code{fork} -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall -or when using @theglibc{} @code{syscall} function. +These functions provide higher random quality than ISO, BSD, and SVID +functions, and may be used in cryptographic contexts. The prototypes for these functions are in @file{stdlib.h}. @pindex stdlib.h diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..8fcf41e7de 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> @@ -24,53 +23,6 @@ #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +30,72 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); + static bool have_getrandom = true, seen_initialized = false; + int fd; - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} - -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do - { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; - } - while (p < end); + ssize_t l; - if (__close_nocancel (fd) == 0) - return; - } - arc4random_getrandom_failure (); -} + if (!__ASSUME_GETRANDOM && !atomic_load_relaxed (&have_getrandom)) + break; -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) - { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + l = __getrandom_nocancel (p, n, 0); + if (l > 0) + { + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ + } + else if (l == 0) + arc4random_getrandom_failure (); /* Weird, should never happen. */ + else if (l == -EINTR) + continue; /* Interrupted by a signal; keep going. */ + else if (!__ASSUME_GETRANDOM && l == -ENOSYS) + { + atomic_store_relaxed (&have_getrandom, false); + break; /* No syscall, so fallback to /dev/urandom. */ + } + arc4random_getrandom_failure (); /* Weird, should never happen. */ } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) + if (!atomic_load_relaxed (&seen_initialized)) { - arc4random_getentropy (buffer, len); - return; + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (TEMP_FAILURE_RETRY (__ppoll_infinity_nocancel (&pfd, 1)) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel (pfd.fd) < 0) + arc4random_getrandom_failure (); + atomic_store_relaxed (&seen_initialized, true); } - arc4random_check_stir (state, len); - while (len > 0) + fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (fd < 0) + arc4random_getrandom_failure (); + do { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + ssize_t l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + p = (uint8_t *) p + l; + n -= l; } + while (n); + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +104,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf (&r, sizeof (r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h index a91915831b..d76c715a96 100644 --- a/sysdeps/generic/tls-internal-struct.h +++ b/sysdeps/generic/tls-internal-struct.h @@ -23,7 +23,6 @@ struct tls_internal_t { char *strsignal_buf; char *strerror_l_buf; - struct arc4random_state_t *rand_state; }; #endif diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/mach/hurd/kernel-features.h b/sysdeps/mach/hurd/kernel-features.h index a7579f6d68..ce97627dc8 100644 --- a/sysdeps/mach/hurd/kernel-features.h +++ b/sysdeps/mach/hurd/kernel-features.h @@ -21,3 +21,4 @@ But those referring to POSIX-level features like O_* flags can be. */ #define __ASSUME_CLOSE_RANGE 1 +#define __ASSUME_GETRANDOM 1 diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 2ccc92b6b8..2f4f9784ee 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \ open_nocancel open64_nocancel \ openat_nocancel openat64_nocancel \ read_nocancel pread64_nocancel \ - write_nocancel statx_cp stat_t64_cp + write_nocancel statx_cp stat_t64_cp \ + ppoll_nocancel sysdep_headers += bits/fcntl-linux.h diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions index 65d2ceda2c..febe1ad421 100644 --- a/sysdeps/unix/sysv/linux/Versions +++ b/sysdeps/unix/sysv/linux/Versions @@ -320,6 +320,7 @@ libc { __read_nocancel; __pread64_nocancel; __close_nocancel; + __ppoll_infinity_nocancel; __sigtimedwait; # functions used by nscd __netlink_assert_response; diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h index 74adc3956b..75d5f953d4 100644 --- a/sysdeps/unix/sysv/linux/kernel-features.h +++ b/sysdeps/unix/sysv/linux/kernel-features.h @@ -236,4 +236,11 @@ # define __ASSUME_FUTEX_LOCK_PI2 0 #endif +/* The getrandom() syscall was added in 3.17. */ +#if __LINUX_KERNEL_VERSION >= 0x031100 +# define __ASSUME_GETRANDOM 1 +#else +# define __ASSUME_GETRANDOM 0 +#endif + #endif /* kernel-features.h */ diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2c58d5ae2f..d3df8fa79e 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -23,6 +23,7 @@ #include <sysdep.h> #include <errno.h> #include <unistd.h> +#include <sys/poll.h> #include <sys/syscall.h> #include <sys/wait.h> #include <time.h> @@ -77,6 +78,10 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; +/* Uncancelable ppoll. */ +int +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds); + #if IS_IN (libc) || IS_IN (rtld) hidden_proto (__open_nocancel) hidden_proto (__open64_nocancel) @@ -87,6 +92,7 @@ hidden_proto (__pread64_nocancel) hidden_proto (__write_nocancel) hidden_proto (__close_nocancel) hidden_proto (__fcntl64_nocancel) +hidden_proto (__ppoll_infinity_nocancel) #endif #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/ppoll_nocancel.c similarity index 62% rename from sysdeps/generic/chacha20_arch.h rename to sysdeps/unix/sysv/linux/ppoll_nocancel.c index 1b4559ccbc..28c8761566 100644 --- a/sysdeps/generic/chacha20_arch.h +++ b/sysdeps/unix/sysv/linux/ppoll_nocancel.c @@ -1,5 +1,5 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. +/* Linux ppoll syscall implementation -- non-cancellable. + Copyright (C) 2018-2022 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -16,9 +16,16 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) +#include <unistd.h> +#include <sysdep-cancel.h> +#include <not-cancel.h> + +int +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) { - chacha20_crypt_generic (state, dst, src, bytes); +#ifndef __NR_ppoll_time64 +# define __NR_ppoll_time64 __NR_ppoll +#endif + return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0); } +hidden_def (__ppoll_infinity_nocancel) diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h index ebc65d896a..2ebe977802 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.h +++ b/sysdeps/unix/sysv/linux/tls-internal.h @@ -28,7 +28,6 @@ __glibc_tls_internal (void) return &THREAD_SELF->tls_state; } -/* Reset the arc4random TCB state on fork. */ extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v4] arc4random: simplify design for better safety 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld @ 2022-07-26 15:21 ` Yann Droneaud 2022-07-26 16:20 ` Adhemerval Zanella Netto 2022-07-26 19:08 ` [PATCH v5] " Jason A. Donenfeld 2 siblings, 0 replies; 81+ messages in thread From: Yann Droneaud @ 2022-07-26 15:21 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha; +Cc: Florian Weimer, Eric Biggers, linux-crypto Hi, Le 26/07/2022 à 15:30, Jason A. Donenfeld via Libc-alpha a écrit : > Rather than buffering 16 MiB of entropy in userspace (by way of > chacha20), simply call getrandom() every time. I dislike the wording because 1) the current buffer is only 512 bytes, not 16MiBytes; 2) implementation reads only 48 bytes of "fresh" entropy from getrandom() each 16MiBytes generated. I'm thinking "stirring" or "streaming" would better describe what's happening: "Rather than stirring 16MiB of random data in userspace before reseeding" > This approach is doubtlessly slower, for now, but trying to prematurely > optimize arc4random appears to be leading toward all sorts of nasty > properties and gotchas. Instead, this patch takes a much more > conservative approach. The interface is added as a basic loop wrapper > around getrandom(), and then later, the kernel and libc together can > work together on optimizing that. > > This prevents numerous issues in which userspace is unaware of when it > really must throw away its buffer, since we avoid buffering all > together. I believe the cloned virtual machine issue should be explicitly described as a major blocker in the commit message. > Future improvements may include userspace learning more from > the kernel about when to do that, which might make these sorts of > chacha20-based optimizations more possible. The current heuristic of 16 > MiB is meaningless garbage that doesn't correspond to anything the > kernel might know about. So for now, let's just do something > conservative that we know is correct and won't lead to cryptographic > issues for users of this function. > > This patch might be considered along the lines of, "optimization is the > root of all evil," in that the much more complex implementation it > replaces moves too fast without considering security implications, > whereas the incremental approach done here is a much safer way of going > about things. Once this lands, we can take our time in optimizing this > properly using new interplay between the kernel and userspace. > > getrandom(0) is used, since that's the one that ensures the bytes > returned are cryptographically secure. But on systems without it, we > fallback to using /dev/urandom. This is unfortunate because it means > opening a file descriptor, but there's not much of a choice. Secondly, > as part of the fallback, in order to get more or less the same > properties of getrandom(0), we poll on /dev/random, and if the poll > succeeds at least once, then we assume the RNG is initialized. This is a > rough approximation, as the ancient "non-blocking pool" initialized > after the "blocking pool", not before, and it may not port back to all > ancient kernels, but it does to a decent swath of them, so generally > it's the best approximation we can do. > > The motivation for including arc4random, in the first place, is to have > source-level compatibility with existing code. That means this patch > doesn't attempt to litigate the interface itself. It does, however, > choose a conservative approach for implementing it. Sure arc4random() interface is inherited from *BSD, thus we're not free to improve it. But arc4random() is already here in glibc git, thus I think the paragraph is of dubious value in the commit message and can be removed. > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Cristian Rodríguez <crrodriguez@opensuse.org> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: Mark Harris <mark.hsj@gmail.com> > Cc: Eric Biggers <ebiggers@kernel.org> > Cc: linux-crypto@vger.kernel.org > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > LICENSES | 23 - > NEWS | 4 +- > include/stdlib.h | 3 - > manual/math.texi | 13 +- > stdlib/Makefile | 2 - > stdlib/arc4random.c | 205 ++----- > stdlib/arc4random.h | 48 -- > stdlib/chacha20.c | 191 ------ > stdlib/tst-arc4random-chacha20.c | 167 ----- > sysdeps/aarch64/Makefile | 4 - > sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- > sysdeps/aarch64/chacha20_arch.h | 40 -- > sysdeps/generic/tls-internal-struct.h | 1 - > sysdeps/generic/tls-internal.c | 10 - > sysdeps/mach/hurd/_Fork.c | 2 - > sysdeps/mach/hurd/kernel-features.h | 1 + > sysdeps/nptl/_Fork.c | 2 - > .../powerpc/powerpc64/be/multiarch/Makefile | 4 - > .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - > .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- > sysdeps/powerpc/powerpc64/power8/Makefile | 5 - > .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- > .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- > sysdeps/s390/s390-64/Makefile | 6 - > sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ > sysdeps/s390/s390-64/chacha20_arch.h | 45 -- > sysdeps/unix/sysv/linux/Makefile | 3 +- > sysdeps/unix/sysv/linux/Versions | 1 + > sysdeps/unix/sysv/linux/kernel-features.h | 7 + > sysdeps/unix/sysv/linux/not-cancel.h | 6 + > .../sysv/linux/ppoll_nocancel.c} | 19 +- > sysdeps/unix/sysv/linux/tls-internal.c | 10 - > sysdeps/unix/sysv/linux/tls-internal.h | 1 - > sysdeps/x86_64/Makefile | 7 - > sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- > sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- > sysdeps/x86_64/chacha20_arch.h | 55 -- > 37 files changed, 89 insertions(+), 2658 deletions(-) > delete mode 100644 stdlib/arc4random.h > delete mode 100644 stdlib/chacha20.c > delete mode 100644 stdlib/tst-arc4random-chacha20.c > delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S > delete mode 100644 sysdeps/aarch64/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S > delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h > rename sysdeps/{generic/chacha20_arch.h => unix/sysv/linux/ppoll_nocancel.c} (62%) > delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S > delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S > delete mode 100644 sysdeps/x86_64/chacha20_arch.h > > diff --git a/manual/math.texi b/manual/math.texi > index 141695cc30..6d69bbff66 100644 > --- a/manual/math.texi > +++ b/manual/math.texi > @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU > extension, based on OpenBSD interfaces. > > @Theglibc{} uses kernel entropy obtained either through @code{getrandom} > -or by reading @file{/dev/urandom} to seed and periodically re-seed the > -internal state. A per-thread data pool is used, which allows fast output > -generation. > +or by reading @file{/dev/urandom} to seed. > > -Although these functions provide higher random quality than ISO, BSD, and > -SVID functions, these still use a Pseudo-Random generator and should not > -be used in cryptographic contexts. > - > -The internal state is cleared and reseeded with kernel entropy on @code{fork} > -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall > -or when using @theglibc{} @code{syscall} function. > +These functions provide higher random quality than ISO, BSD, and SVID > +functions, and may be used in cryptographic contexts. + "provided getrandom() and /dev/urandom() could be used in such context." ;) Thanks for the improvements, can't wait for a vDSO getrandom() optimized for reading 1,2,4,8 bytes :) Regards. -- Yann Droneaud OPTEYA ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v4] arc4random: simplify design for better safety 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld 2022-07-26 15:21 ` Yann Droneaud @ 2022-07-26 16:20 ` Adhemerval Zanella Netto 2022-07-26 18:36 ` Jason A. Donenfeld 2022-07-26 19:08 ` [PATCH v5] " Jason A. Donenfeld 2 siblings, 1 reply; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 16:20 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto On 26/07/22 10:30, Jason A. Donenfeld wrote: > + l = __getrandom_nocancel (p, n, 0); > + if (l > 0) > + { > + if ((size_t) l == n) > + return; /* Done reading, success. */ > + p = (uint8_t *) p + l; > + n -= l; > + continue; /* Interrupted by a signal; keep going. */ > + } > + else if (l == 0) > + arc4random_getrandom_failure (); /* Weird, should never happen. */ > + else if (l == -EINTR) > + continue; /* Interrupted by a signal; keep going. */ > + else if (!__ASSUME_GETRANDOM && l == -ENOSYS) > + { > + atomic_store_relaxed (&have_getrandom, false); I still think there is no much gain in this optimization, the syscall will most likely be present and it is one less static data. Also, we avoid to use __ASSUME_GETRANDOM on generic code (all __ASSUME usage within sysdeps and/or nptl). > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile > index 2ccc92b6b8..2f4f9784ee 100644 > --- a/sysdeps/unix/sysv/linux/Makefile > +++ b/sysdeps/unix/sysv/linux/Makefile > @@ -380,7 +380,8 @@ sysdep_routines += xstatconv internal_statvfs \ > open_nocancel open64_nocancel \ > openat_nocancel openat64_nocancel \ > read_nocancel pread64_nocancel \ > - write_nocancel statx_cp stat_t64_cp > + write_nocancel statx_cp stat_t64_cp \ > + ppoll_nocancel > > sysdep_headers += bits/fcntl-linux.h > > diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions > index 65d2ceda2c..febe1ad421 100644 > --- a/sysdeps/unix/sysv/linux/Versions > +++ b/sysdeps/unix/sysv/linux/Versions > @@ -320,6 +320,7 @@ libc { > __read_nocancel; > __pread64_nocancel; > __close_nocancel; > + __ppoll_infinity_nocancel; > __sigtimedwait; > # functions used by nscd > __netlink_assert_response; There is no need to export on GLIBC_PRIVATE, since it is not currently usage libc.so. Just define is a hidden (attribute_hidden). > diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h > index 74adc3956b..75d5f953d4 100644 > --- a/sysdeps/unix/sysv/linux/kernel-features.h > +++ b/sysdeps/unix/sysv/linux/kernel-features.h > @@ -236,4 +236,11 @@ > # define __ASSUME_FUTEX_LOCK_PI2 0 > #endif > > +/* The getrandom() syscall was added in 3.17. */ > +#if __LINUX_KERNEL_VERSION >= 0x031100 > +# define __ASSUME_GETRANDOM 1 > +#else > +# define __ASSUME_GETRANDOM 0 > +#endif > + > #endif /* kernel-features.h */ > diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h > index 2c58d5ae2f..d3df8fa79e 100644 > --- a/sysdeps/unix/sysv/linux/not-cancel.h > +++ b/sysdeps/unix/sysv/linux/not-cancel.h > @@ -23,6 +23,7 @@ > #include <sysdep.h> > #include <errno.h> > #include <unistd.h> > +#include <sys/poll.h> > #include <sys/syscall.h> > #include <sys/wait.h> > #include <time.h> > @@ -77,6 +78,10 @@ __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) > /* Uncancelable fcntl. */ > __typeof (__fcntl) __fcntl64_nocancel; > > +/* Uncancelable ppoll. */ > +int > +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds); Use attribute_hidden here and remove it from sysdeps/unix/sysv/linux/Versions. > + > #if IS_IN (libc) || IS_IN (rtld) > hidden_proto (__open_nocancel) > hidden_proto (__open64_nocancel) > @@ -87,6 +92,7 @@ hidden_proto (__pread64_nocancel) > hidden_proto (__write_nocancel) > hidden_proto (__close_nocancel) > hidden_proto (__fcntl64_nocancel) > +hidden_proto (__ppoll_infinity_nocancel) > #endif > > #endif /* NOT_CANCEL_H */ Also update the hurd sysdeps/mach/hurd/not-cancel.h with a wrapper to __poll (since it does not really support pthread cancellation). > diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/unix/sysv/linux/ppoll_nocancel.c > similarity index 62% > rename from sysdeps/generic/chacha20_arch.h > rename to sysdeps/unix/sysv/linux/ppoll_nocancel.c > index 1b4559ccbc..28c8761566 100644 > --- a/sysdeps/generic/chacha20_arch.h > +++ b/sysdeps/unix/sysv/linux/ppoll_nocancel.c > @@ -1,5 +1,5 @@ > -/* Chacha20 implementation, generic interface for encrypt. > - Copyright (C) 2022 Free Software Foundation, Inc. > +/* Linux ppoll syscall implementation -- non-cancellable. > + Copyright (C) 2018-2022 Free Software Foundation, Inc. > This file is part of the GNU C Library. > > The GNU C Library is free software; you can redistribute it and/or > @@ -16,9 +16,16 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > +#include <unistd.h> > +#include <sysdep-cancel.h> > +#include <not-cancel.h> > + > +int > +__ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) > { > - chacha20_crypt_generic (state, dst, src, bytes); > +#ifndef __NR_ppoll_time64 > +# define __NR_ppoll_time64 __NR_ppoll > +#endif > + return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0); > } > +hidden_def (__ppoll_infinity_nocancel) Maybe just add an inline wrapper on sysdeps/unix/sysv/linux/not-cancel.h, as for __getrandom_nocancel: static inline int __ppoll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) { #ifndef __NR_ppoll_time64 # define __NR_ppoll_time64 __NR_ppoll #endif return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0); } It avoids a lot of boilerplate code to add the internal symbol. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v4] arc4random: simplify design for better safety 2022-07-26 16:20 ` Adhemerval Zanella Netto @ 2022-07-26 18:36 ` Jason A. Donenfeld 0 siblings, 0 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 18:36 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: libc-alpha, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto Hi Adhemerval, On Tue, Jul 26, 2022 at 01:20:11PM -0300, Adhemerval Zanella Netto wrote: > > + { > > + atomic_store_relaxed (&have_getrandom, false); > > I still think there is no much gain in this optimization, the syscall will > most likely be present and it is one less static data. Also, we avoid to > use __ASSUME_GETRANDOM on generic code (all __ASSUME usage within > sysdeps and/or nptl). Oh! *That's* what you were talking about before. Sorry I didn't catch your meaning the first time through. Okay so you're alright having +1 syscall overhead on old systems, so that new systems can have a byte less of static data. I don't hold any opinions either way there and will defer to your expertise, so I'll get rid of this part on v5. > > + __ppoll_infinity_nocancel; > > __sigtimedwait; > > # functions used by nscd > > __netlink_assert_response; > > There is no need to export on GLIBC_PRIVATE, since it is not currently usage > libc.so. Just define is a hidden (attribute_hidden). > Use attribute_hidden here and remove it from sysdeps/unix/sysv/linux/Versions. >> Maybe just add an inline wrapper on sysdeps/unix/sysv/linux/not-cancel.h, > as for __getrandom_nocancel: > It avoids a lot of boilerplate code to add the internal symbol. Okay I'll skip all the symbol stuff and just do the static inline like getrandom has. Thanks for the suggestion; that's a lot simpler. > Also update the hurd sysdeps/mach/hurd/not-cancel.h with a wrapper to > __poll (since it does not really support pthread cancellation). Ack. Thanks for the comments. v5 coming up shortly. Jason ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v5] arc4random: simplify design for better safety 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld 2022-07-26 15:21 ` Yann Droneaud 2022-07-26 16:20 ` Adhemerval Zanella Netto @ 2022-07-26 19:08 ` Jason A. Donenfeld 2022-07-26 19:58 ` [PATCH v6] " Jason A. Donenfeld 2 siblings, 1 reply; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 19:08 UTC (permalink / raw) To: libc-alpha Cc: Jason A. Donenfeld, Adhemerval Zanella Netto, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, but it does to a decent swath of them, so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - NEWS | 4 +- include/stdlib.h | 3 - manual/math.texi | 13 +- stdlib/Makefile | 2 - stdlib/arc4random.c | 195 ++---- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/chacha20_arch.h | 24 - sysdeps/generic/tls-internal-struct.h | 1 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/mach/hurd/kernel-features.h | 1 + sysdeps/mach/hurd/not-cancel.h | 3 + sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/kernel-features.h | 7 + sysdeps/unix/sysv/linux/not-cancel.h | 11 +- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/unix/sysv/linux/tls-internal.h | 1 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 36 files changed, 70 insertions(+), 2676 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/generic/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/NEWS b/NEWS index 8420a65cd0..fe531bfe1e 100644 --- a/NEWS +++ b/NEWS @@ -61,8 +61,8 @@ Major new features: is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). * The functions arc4random, arc4random_buf, and arc4random_uniform have been - added. The functions use a pseudo-random number generator along with - entropy from the kernel. + added. The functions wrap getrandom and/or /dev/urandom to return high- + quality randomness from the kernel. Deprecated and removed features, and other changes affecting compatibility: diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/manual/math.texi b/manual/math.texi index 141695cc30..6d69bbff66 100644 --- a/manual/math.texi +++ b/manual/math.texi @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU extension, based on OpenBSD interfaces. @Theglibc{} uses kernel entropy obtained either through @code{getrandom} -or by reading @file{/dev/urandom} to seed and periodically re-seed the -internal state. A per-thread data pool is used, which allows fast output -generation. +or by reading @file{/dev/urandom} to seed. -Although these functions provide higher random quality than ISO, BSD, and -SVID functions, these still use a Pseudo-Random generator and should not -be used in cryptographic contexts. - -The internal state is cleared and reseeded with kernel entropy on @code{fork} -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall -or when using @theglibc{} @code{syscall} function. +These functions provide higher random quality than ISO, BSD, and SVID +functions, and may be used in cryptographic contexts. The prototypes for these functions are in @file{stdlib.h}. @pindex stdlib.h diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..e819af0c99 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> @@ -24,53 +23,6 @@ #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +30,62 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); - - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} + static bool seen_initialized = false; + size_t l; + int fd; -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do + l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0)); + if (l > 0) { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ } - while (p < end); - - if (__close_nocancel (fd) == 0) - return; + else if (!__ASSUME_GETRANDOM && l < 0 && errno == ENOSYS) + break; /* No syscall, so fallback to /dev/urandom. */ + arc4random_getrandom_failure (); } - arc4random_getrandom_failure (); -} -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) + if (!atomic_load_relaxed (&seen_initialized)) { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel (pfd.fd) < 0) + arc4random_getrandom_failure (); + atomic_store_relaxed (&seen_initialized, true); } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (buffer, len); - return; - } - - arc4random_check_stir (state, len); - while (len > 0) + fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (fd < 0) + arc4random_getrandom_failure (); + for (;;) { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + if ((size_t) l == n) + break; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; } + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +94,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf (&r, sizeof (r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h deleted file mode 100644 index 1b4559ccbc..0000000000 --- a/sysdeps/generic/chacha20_arch.h +++ /dev/null @@ -1,24 +0,0 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h index a91915831b..d76c715a96 100644 --- a/sysdeps/generic/tls-internal-struct.h +++ b/sysdeps/generic/tls-internal-struct.h @@ -23,7 +23,6 @@ struct tls_internal_t { char *strsignal_buf; char *strerror_l_buf; - struct arc4random_state_t *rand_state; }; #endif diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/mach/hurd/kernel-features.h b/sysdeps/mach/hurd/kernel-features.h index a7579f6d68..ce97627dc8 100644 --- a/sysdeps/mach/hurd/kernel-features.h +++ b/sysdeps/mach/hurd/kernel-features.h @@ -21,3 +21,4 @@ But those referring to POSIX-level features like O_* flags can be. */ #define __ASSUME_CLOSE_RANGE 1 +#define __ASSUME_GETRANDOM 1 diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 9a3a7ed59a..af5eff3559 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -77,6 +77,9 @@ __typeof (__fcntl) __fcntl_nocancel; #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __poll_infinity_nocancel(fds, nfds) \ + __poll (fds, nfds, -1) + #if IS_IN (libc) hidden_proto (__close_nocancel) hidden_proto (__close_nocancel_nostatus) diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h index 74adc3956b..75d5f953d4 100644 --- a/sysdeps/unix/sysv/linux/kernel-features.h +++ b/sysdeps/unix/sysv/linux/kernel-features.h @@ -236,4 +236,11 @@ # define __ASSUME_FUTEX_LOCK_PI2 0 #endif +/* The getrandom() syscall was added in 3.17. */ +#if __LINUX_KERNEL_VERSION >= 0x031100 +# define __ASSUME_GETRANDOM 1 +#else +# define __ASSUME_GETRANDOM 0 +#endif + #endif /* kernel-features.h */ diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2c58d5ae2f..4fcdf08c9a 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -23,6 +23,7 @@ #include <sysdep.h> #include <errno.h> #include <unistd.h> +#include <sys/poll.h> #include <sys/syscall.h> #include <sys/wait.h> #include <time.h> @@ -70,9 +71,17 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) static inline int __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) { - return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); + return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags); } +static inline int +__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) +{ +#ifndef __NR_ppoll_time64 +# define __NR_ppoll_time64 __NR_ppoll +#endif + return INLINE_SYSCALL_CALL (ppoll_time64, fds, nfds, NULL, NULL, 0); +} /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h index ebc65d896a..2ebe977802 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.h +++ b/sysdeps/unix/sysv/linux/tls-internal.h @@ -28,7 +28,6 @@ __glibc_tls_internal (void) return &THREAD_SELF->tls_state; } -/* Reset the arc4random TCB state on fork. */ extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH v6] arc4random: simplify design for better safety 2022-07-26 19:08 ` [PATCH v5] " Jason A. Donenfeld @ 2022-07-26 19:58 ` Jason A. Donenfeld 2022-07-26 20:17 ` Adhemerval Zanella Netto 2022-07-28 10:29 ` Szabolcs Nagy 0 siblings, 2 replies; 81+ messages in thread From: Jason A. Donenfeld @ 2022-07-26 19:58 UTC (permalink / raw) To: libc-alpha, adhemerval.zanella Cc: Jason A. Donenfeld, Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, though it does to all kernels supported by glibc (≥3.2), so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- LICENSES | 23 - NEWS | 4 +- include/stdlib.h | 3 - manual/math.texi | 13 +- stdlib/Makefile | 2 - stdlib/arc4random.c | 196 ++---- stdlib/arc4random.h | 48 -- stdlib/chacha20.c | 191 ------ stdlib/tst-arc4random-chacha20.c | 167 ----- sysdeps/aarch64/Makefile | 4 - sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- sysdeps/aarch64/chacha20_arch.h | 40 -- sysdeps/generic/chacha20_arch.h | 24 - sysdeps/generic/not-cancel.h | 3 + sysdeps/generic/tls-internal-struct.h | 1 - sysdeps/generic/tls-internal.c | 10 - sysdeps/mach/hurd/_Fork.c | 2 - sysdeps/mach/hurd/not-cancel.h | 4 + sysdeps/nptl/_Fork.c | 2 - .../powerpc/powerpc64/be/multiarch/Makefile | 4 - .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- sysdeps/powerpc/powerpc64/power8/Makefile | 5 - .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- sysdeps/s390/s390-64/Makefile | 6 - sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ sysdeps/s390/s390-64/chacha20_arch.h | 45 -- sysdeps/unix/sysv/linux/not-cancel.h | 8 +- sysdeps/unix/sysv/linux/tls-internal.c | 10 - sysdeps/unix/sysv/linux/tls-internal.h | 1 - sysdeps/x86_64/Makefile | 7 - sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- sysdeps/x86_64/chacha20_arch.h | 55 -- 35 files changed, 64 insertions(+), 2676 deletions(-) delete mode 100644 stdlib/arc4random.h delete mode 100644 stdlib/chacha20.c delete mode 100644 stdlib/tst-arc4random-chacha20.c delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S delete mode 100644 sysdeps/aarch64/chacha20_arch.h delete mode 100644 sysdeps/generic/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S delete mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index cd04fb6e84..530893b1dc 100644 --- a/LICENSES +++ b/LICENSES @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> You should have received a copy of the GNU Lesser General Public License along with this library; if not, see <https://www.gnu.org/licenses/>. */ -\f -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, -with the following notices: - -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - -This file is part of Libgcrypt. - -Libgcrypt is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as -published by the Free Software Foundation; either version 2.1 of -the License, or (at your option) any later version. - -Libgcrypt is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Lesser General Public License for more details. - -You should have received a copy of the GNU Lesser General Public -License along with this program; if not, see <https://www.gnu.org/licenses/>. diff --git a/NEWS b/NEWS index 8420a65cd0..fe531bfe1e 100644 --- a/NEWS +++ b/NEWS @@ -61,8 +61,8 @@ Major new features: is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). * The functions arc4random, arc4random_buf, and arc4random_uniform have been - added. The functions use a pseudo-random number generator along with - entropy from the kernel. + added. The functions wrap getrandom and/or /dev/urandom to return high- + quality randomness from the kernel. Deprecated and removed features, and other changes affecting compatibility: diff --git a/include/stdlib.h b/include/stdlib.h index cae7f7cdf8..db51f4a4f6 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; libc_hidden_proto (__arc4random_uniform); extern void __arc4random_buf_internal (void *buffer, size_t len) attribute_hidden; -/* Called from the fork function to reinitialize the internal cipher state - in child process. */ -extern void __arc4random_fork_subprocess (void) attribute_hidden; extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) diff --git a/manual/math.texi b/manual/math.texi index 141695cc30..6d69bbff66 100644 --- a/manual/math.texi +++ b/manual/math.texi @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU extension, based on OpenBSD interfaces. @Theglibc{} uses kernel entropy obtained either through @code{getrandom} -or by reading @file{/dev/urandom} to seed and periodically re-seed the -internal state. A per-thread data pool is used, which allows fast output -generation. +or by reading @file{/dev/urandom} to seed. -Although these functions provide higher random quality than ISO, BSD, and -SVID functions, these still use a Pseudo-Random generator and should not -be used in cryptographic contexts. - -The internal state is cleared and reseeded with kernel entropy on @code{fork} -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall -or when using @theglibc{} @code{syscall} function. +These functions provide higher random quality than ISO, BSD, and SVID +functions, and may be used in cryptographic contexts. The prototypes for these functions are in @file{stdlib.h}. @pindex stdlib.h diff --git a/stdlib/Makefile b/stdlib/Makefile index a900962685..f7b25c1981 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -246,7 +246,6 @@ tests := \ # tests tests-internal := \ - tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -256,7 +255,6 @@ tests-internal := \ # tests-internal tests-static := \ - tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c index 65547e79aa..0cb9991328 100644 --- a/stdlib/arc4random.c +++ b/stdlib/arc4random.c @@ -1,4 +1,4 @@ -/* Pseudo Random Number Generator based on ChaCha20. +/* Pseudo Random Number Generator Copyright (C) 2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <arc4random.h> #include <errno.h> #include <not-cancel.h> #include <stdio.h> @@ -24,53 +23,6 @@ #include <sys/mman.h> #include <sys/param.h> #include <sys/random.h> -#include <tls-internal.h> - -/* arc4random keeps two counters: 'have' is the current valid bytes not yet - consumed in 'buf' while 'count' is the maximum number of bytes until a - reseed. - - Both the initial seed and reseed try to obtain entropy from the kernel - and abort the process if none could be obtained. - - The state 'buf' improves the usage of the cipher calls, allowing to call - optimized implementations (if the architecture provides it) and minimize - function call overhead. */ - -#include <chacha20.c> - -/* Called from the fork function to reset the state. */ -void -__arc4random_fork_subprocess (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state != NULL) - { - explicit_bzero (state, sizeof (*state)); - /* Force key init. */ - state->count = -1; - } -} - -/* Return the current thread random state or try to create one if there is - none available. In the case malloc can not allocate a state, arc4random - will try to get entropy with arc4random_getentropy. */ -static struct arc4random_state_t * -arc4random_get_state (void) -{ - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; - if (state == NULL) - { - state = malloc (sizeof (struct arc4random_state_t)); - if (state != NULL) - { - /* Force key initialization on first call. */ - state->count = -1; - __glibc_tls_internal ()->rand_state = state; - } - } - return state; -} static void arc4random_getrandom_failure (void) @@ -78,106 +30,63 @@ arc4random_getrandom_failure (void) __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); } -static void -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +void +__arc4random_buf (void *p, size_t n) { - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); - - /* Mix optional user provided data. */ - if (rnd != NULL) - { - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - for (size_t i = 0; i < m; i++) - state->buf[i] ^= rnd[i]; - } - - /* Immediately reinit for backtracking resistance. */ - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); -} + static int seen_initialized; + size_t l; + int fd; -static void -arc4random_getentropy (void *rnd, size_t len) -{ - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + if (n == 0) return; - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", - O_RDONLY | O_CLOEXEC)); - if (fd != -1) + for (;;) { - uint8_t *p = rnd; - uint8_t *end = p + len; - do + l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0)); + if (l > 0) { - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); - if (ret <= 0) - arc4random_getrandom_failure (); - p += ret; + if ((size_t) l == n) + return; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; + continue; /* Interrupted by a signal; keep going. */ } - while (p < end); - - if (__close_nocancel (fd) == 0) - return; + else if (l < 0 && errno == ENOSYS) + break; /* No syscall, so fallback to /dev/urandom. */ + arc4random_getrandom_failure (); } - arc4random_getrandom_failure (); -} -/* Check if the thread context STATE should be reseed with kernel entropy - depending of requested LEN bytes. If there is less than requested, - the state is either initialized or reseeded, otherwise the internal - counter subtract the requested length. */ -static void -arc4random_check_stir (struct arc4random_state_t *state, size_t len) -{ - if (state->count <= len || state->count == -1) + if (!atomic_load_relaxed (&seen_initialized)) { - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; - arc4random_getentropy (rnd, sizeof rnd); - - if (state->count == -1) - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); - else - arc4random_rekey (state, rnd, sizeof rnd); - - explicit_bzero (rnd, sizeof rnd); - - /* Invalidate the buf. */ - state->have = 0; - memset (state->buf, 0, sizeof state->buf); - state->count = CHACHA20_RESEED_SIZE; + /* Poll /dev/random as an approximation of RNG initialization. */ + struct pollfd pfd = { .events = POLLIN }; + pfd.fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (pfd.fd < 0) + arc4random_getrandom_failure (); + if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0) + arc4random_getrandom_failure (); + if (__close_nocancel (pfd.fd) < 0) + arc4random_getrandom_failure (); + atomic_store_relaxed (&seen_initialized, 1); } - else - state->count -= len; -} -void -__arc4random_buf (void *buffer, size_t len) -{ - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (buffer, len); - return; - } - - arc4random_check_stir (state, len); - while (len > 0) + fd = TEMP_FAILURE_RETRY ( + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); + if (fd < 0) + arc4random_getrandom_failure (); + for (;;) { - if (state->have > 0) - { - size_t m = MIN (len, state->have); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (buffer, ks, m); - explicit_bzero (ks, m); - buffer += m; - len -= m; - state->have -= m; - } - if (state->have == 0) - arc4random_rekey (state, NULL, 0); + l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); + if (l <= 0) + arc4random_getrandom_failure (); + if ((size_t) l == n) + break; /* Done reading, success. */ + p = (uint8_t *) p + l; + n -= l; } + if (__close_nocancel (fd) < 0) + arc4random_getrandom_failure (); } libc_hidden_def (__arc4random_buf) weak_alias (__arc4random_buf, arc4random_buf) @@ -186,22 +95,7 @@ uint32_t __arc4random (void) { uint32_t r; - - struct arc4random_state_t *state = arc4random_get_state (); - if (__glibc_unlikely (state == NULL)) - { - arc4random_getentropy (&r, sizeof (uint32_t)); - return r; - } - - arc4random_check_stir (state, sizeof (uint32_t)); - if (state->have < sizeof (uint32_t)) - arc4random_rekey (state, NULL, 0); - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; - memcpy (&r, ks, sizeof (uint32_t)); - memset (ks, 0, sizeof (uint32_t)); - state->have -= sizeof (uint32_t); - + __arc4random_buf (&r, sizeof (r)); return r; } libc_hidden_def (__arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h deleted file mode 100644 index cd39389c19..0000000000 --- a/stdlib/arc4random.h +++ /dev/null @@ -1,48 +0,0 @@ -/* Arc4random definition used on TLS. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#ifndef _CHACHA20_H -#define _CHACHA20_H - -#include <stddef.h> -#include <stdint.h> - -/* Internal ChaCha20 state. */ -#define CHACHA20_STATE_LEN 16 -#define CHACHA20_BLOCK_SIZE 64 - -/* Maximum number bytes until reseed (16 MB). */ -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) - -/* Internal arc4random buffer, used on each feedback step so offer some - backtracking protection and to allow better used of vectorized - chacha20 implementations. */ -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) - -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); - -struct arc4random_state_t -{ - uint32_t ctx[CHACHA20_STATE_LEN]; - size_t have; - size_t count; - uint8_t buf[CHACHA20_BUFSIZE]; -}; - -#endif diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c deleted file mode 100644 index 2745a81315..0000000000 --- a/stdlib/chacha20.c +++ /dev/null @@ -1,191 +0,0 @@ -/* Generic ChaCha20 implementation (used on arc4random). - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <array_length.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <string.h> - -/* 32-bit stream position, then 96-bit nonce. */ -#define CHACHA20_IV_SIZE 16 -#define CHACHA20_KEY_SIZE 32 - -#define CHACHA20_STATE_LEN 16 - -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final - XOR of the keystream with the plaintext because the plaintext is a - stream of zeros. */ - -enum chacha20_constants -{ - CHACHA20_CONSTANT_EXPA = 0x61707865U, - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, - CHACHA20_CONSTANT_2_BY = 0x79622d32U, - CHACHA20_CONSTANT_TE_K = 0x6b206574U -}; - -static inline uint32_t -read_unaligned_32 (const uint8_t *p) -{ - uint32_t r; - memcpy (&r, p, sizeof (r)); - return r; -} - -static inline void -write_unaligned_32 (uint8_t *p, uint32_t v) -{ - memcpy (p, &v, sizeof (v)); -} - -#if __BYTE_ORDER == __BIG_ENDIAN -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) -# define set_state(v) __builtin_bswap32 ((v)) -#else -# define read_unaligned_le32(p) read_unaligned_32 ((p)) -# define set_state(v) (v) -#endif - -static inline void -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) -{ - state[0] = CHACHA20_CONSTANT_EXPA; - state[1] = CHACHA20_CONSTANT_ND_3; - state[2] = CHACHA20_CONSTANT_2_BY; - state[3] = CHACHA20_CONSTANT_TE_K; - - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); - - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); -} - -static inline uint32_t -rotl32 (unsigned int shift, uint32_t word) -{ - return (word << (shift & 31)) | (word >> ((-shift) & 31)); -} - -static void -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) -{ -#ifdef CHACHA20_XOR_FINAL - v ^= read_unaligned_32 (src); -#endif - write_unaligned_32 (dst, v); -} - -static inline void -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) -{ - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; - - x0 = state[0]; - x1 = state[1]; - x2 = state[2]; - x3 = state[3]; - x4 = state[4]; - x5 = state[5]; - x6 = state[6]; - x7 = state[7]; - x8 = state[8]; - x9 = state[9]; - x10 = state[10]; - x11 = state[11]; - x12 = state[12]; - x13 = state[13]; - x14 = state[14]; - x15 = state[15]; - - for (int i = 0; i < 20; i += 2) - { -#define QROUND(_x0, _x1, _x2, _x3) \ - do { \ - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ - } while(0) - - QROUND (x0, x4, x8, x12); - QROUND (x1, x5, x9, x13); - QROUND (x2, x6, x10, x14); - QROUND (x3, x7, x11, x15); - - QROUND (x0, x5, x10, x15); - QROUND (x1, x6, x11, x12); - QROUND (x2, x7, x8, x13); - QROUND (x3, x4, x9, x14); - } - - state_final (&src[0], &dst[0], set_state (x0 + state[0])); - state_final (&src[4], &dst[4], set_state (x1 + state[1])); - state_final (&src[8], &dst[8], set_state (x2 + state[2])); - state_final (&src[12], &dst[12], set_state (x3 + state[3])); - state_final (&src[16], &dst[16], set_state (x4 + state[4])); - state_final (&src[20], &dst[20], set_state (x5 + state[5])); - state_final (&src[24], &dst[24], set_state (x6 + state[6])); - state_final (&src[28], &dst[28], set_state (x7 + state[7])); - state_final (&src[32], &dst[32], set_state (x8 + state[8])); - state_final (&src[36], &dst[36], set_state (x9 + state[9])); - state_final (&src[40], &dst[40], set_state (x10 + state[10])); - state_final (&src[44], &dst[44], set_state (x11 + state[11])); - state_final (&src[48], &dst[48], set_state (x12 + state[12])); - state_final (&src[52], &dst[52], set_state (x13 + state[13])); - state_final (&src[56], &dst[56], set_state (x14 + state[14])); - state_final (&src[60], &dst[60], set_state (x15 + state[15])); - - state[12]++; -} - -static void -__attribute_maybe_unused__ -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - while (bytes >= CHACHA20_BLOCK_SIZE) - { - chacha20_block (state, dst, src); - - bytes -= CHACHA20_BLOCK_SIZE; - dst += CHACHA20_BLOCK_SIZE; - src += CHACHA20_BLOCK_SIZE; - } - - if (__glibc_unlikely (bytes != 0)) - { - uint8_t stream[CHACHA20_BLOCK_SIZE]; - chacha20_block (state, stream, src); - memcpy (dst, stream, bytes); - explicit_bzero (stream, sizeof stream); - } -} - -/* Get the architecture optimized version. */ -#include <chacha20_arch.h> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c deleted file mode 100644 index 45ba54920d..0000000000 --- a/stdlib/tst-arc4random-chacha20.c +++ /dev/null @@ -1,167 +0,0 @@ -/* Basic tests for chacha20 cypher used in arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <arc4random.h> -#include <support/check.h> -#include <sys/cdefs.h> - -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random - actual does. */ -#include <chacha20.c> - -static int -do_test (void) -{ - const uint8_t key[CHACHA20_KEY_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t iv[CHACHA20_IV_SIZE] = - { - 0x0, 0x0, 0x0, 0x0, - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, - }; - const uint8_t expected1[CHACHA20_BUFSIZE] = - { - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb - }; - - const uint8_t expected2[CHACHA20_BUFSIZE] = - { - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 - }; - - /* Check with the expected internal arc4random keystream buffer. Some - architecture optimizations expects a buffer with a minimum size which - is a multiple of then ChaCha20 blocksize, so they might not be prepared - to handle smaller buffers. */ - - uint8_t output[CHACHA20_BUFSIZE]; - - uint32_t state[CHACHA20_STATE_LEN]; - chacha20_init (state, key, iv); - - /* Check with the initial state. */ - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; - - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); - - /* And on the next round. */ - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); - - return 0; -} - -#include <support/test-driver.c> diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7dfd1b62dd..17fb1c5b72 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-aarch64 -endif - ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S deleted file mode 100644 index cce5291c5c..0000000000 --- a/sysdeps/aarch64/chacha20-aarch64.S +++ /dev/null @@ -1,314 +0,0 @@ -/* Optimized AArch64 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -/* Only LE is supported. */ -#ifdef __AARCH64EL__ - -#define GET_DATA_POINTER(reg, name) \ - adrp reg, name ; \ - add reg, reg, :lo12:name - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; dsb sy; isb; - -.cpu generic+simd - -.text - -/* register macros */ -#define INPUT x0 -#define DST x1 -#define SRC x2 -#define NBLKS x3 -#define ROUND x4 -#define INPUT_CTR x5 -#define INPUT_POS x6 -#define CTR x7 - -/* vector registers */ -#define X0 v16 -#define X4 v17 -#define X8 v18 -#define X12 v19 - -#define X1 v20 -#define X5 v21 - -#define X9 v22 -#define X13 v23 -#define X2 v24 -#define X6 v25 - -#define X3 v26 -#define X7 v27 -#define X11 v28 -#define X15 v29 - -#define X10 v30 -#define X14 v31 - -#define VCTR v0 -#define VTMP0 v1 -#define VTMP1 v2 -#define VTMP2 v3 -#define VTMP3 v4 -#define X12_TMP v5 -#define X13_TMP v6 -#define ROT8 v7 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _(...) __VA_ARGS__ - -#define vpunpckldq(s1, s2, dst) \ - zip1 dst.4s, s2.4s, s1.4s; - -#define vpunpckhdq(s1, s2, dst) \ - zip2 dst.4s, s2.4s, s1.4s; - -#define vpunpcklqdq(s1, s2, dst) \ - zip1 dst.2d, s2.2d, s1.2d; - -#define vpunpckhqdq(s1, s2, dst) \ - zip2 dst.2d, s2.2d, s1.2d; - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ - vpunpckhdq(x1, x0, t2); \ - vpunpckldq(x1, x0, x0); \ - \ - vpunpckldq(x3, x2, t1); \ - vpunpckhdq(x3, x2, x2); \ - \ - vpunpckhqdq(t1, x0, x1); \ - vpunpcklqdq(t1, x0, x0); \ - \ - vpunpckhqdq(x2, t2, x3); \ - vpunpcklqdq(x2, t2, x2); - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define XOR(d,s1,s2) \ - eor d.16b, s2.16b, s1.16b; - -#define PLUS(ds,s) \ - add ds.4s, ds.4s, s.4s; - -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ - shl dst1.4s, src1.4s, #(c); \ - shl dst2.4s, src2.4s, #(c); \ - shl dst3.4s, src3.4s, #(c); \ - shl dst4.4s, src4.4s, #(c); \ - sri dst1.4s, src1.4s, #(32 - (c)); \ - sri dst2.4s, src2.4s, #(32 - (c)); \ - sri dst3.4s, src3.4s, #(32 - (c)); \ - sri dst4.4s, src4.4s, #(32 - (c)); - -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - tbl dst1.16b, {src1.16b}, ROT8.16b; \ - tbl dst2.16b, {src2.16b}, ROT8.16b; \ - tbl dst3.16b, {src3.16b}, ROT8.16b; \ - tbl dst4.16b, {src4.16b}, ROT8.16b; - -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ - rev32 dst1.8h, src1.8h; \ - rev32 dst2.8h, src2.8h; \ - rev32 dst3.8h, src3.8h; \ - rev32 dst4.8h, src4.8h; - -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ - PLUS(a1,b1); PLUS(a2,b2); \ - PLUS(a3,b3); PLUS(a4,b4); \ - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ - PLUS(c1,d1); PLUS(c2,d2); \ - PLUS(c3,d3); PLUS(c4,d4); \ - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ - -.align 4 -L(__chacha20_blocks4_data_inc_counter): - .long 0,1,2,3 - -.align 4 -L(__chacha20_blocks4_data_rot8): - .byte 3,0,1,2 - .byte 7,4,5,6 - .byte 11,8,9,10 - .byte 15,12,13,14 - -.hidden __chacha20_neon_blocks4 -ENTRY (__chacha20_neon_blocks4) - /* input: - * x0: input - * x1: dst - * x2: src - * x3: nblks (multiple of 4) - */ - - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) - add INPUT_CTR, INPUT, #(12*4); - ld1 {ROT8.16b}, [CTR]; - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) - mov INPUT_POS, INPUT; - ld1 {VCTR.16b}, [CTR]; - -L(loop4): - /* Construct counter vectors X12 and X13 */ - - ld1 {X15.16b}, [INPUT_CTR]; - mov ROUND, #20; - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; - - dup X12.4s, X15.s[0]; - dup X13.4s, X15.s[1]; - ldr CTR, [INPUT_CTR]; - add X12.4s, X12.4s, VCTR.4s; - dup X0.4s, VTMP1.s[0]; - dup X1.4s, VTMP1.s[1]; - dup X2.4s, VTMP1.s[2]; - dup X3.4s, VTMP1.s[3]; - dup X14.4s, X15.s[2]; - cmhi VTMP0.4s, VCTR.4s, X12.4s; - dup X15.4s, X15.s[3]; - add CTR, CTR, #4; /* Update counter */ - dup X4.4s, VTMP2.s[0]; - dup X5.4s, VTMP2.s[1]; - dup X6.4s, VTMP2.s[2]; - dup X7.4s, VTMP2.s[3]; - sub X13.4s, X13.4s, VTMP0.4s; - dup X8.4s, VTMP3.s[0]; - dup X9.4s, VTMP3.s[1]; - dup X10.4s, VTMP3.s[2]; - dup X11.4s, VTMP3.s[3]; - mov X12_TMP.16b, X12.16b; - mov X13_TMP.16b, X13.16b; - str CTR, [INPUT_CTR]; - -L(round2): - subs ROUND, ROUND, #2 - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) - b.ne L(round2); - - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; - - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ - PLUS(X0, VTMP2); - PLUS(X1, VTMP3); - PLUS(X2, X12_TMP); - PLUS(X3, X13_TMP); - - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; - mov INPUT_POS, INPUT; - PLUS(X4, VTMP2); - PLUS(X5, VTMP3); - PLUS(X6, X12_TMP); - PLUS(X7, X13_TMP); - - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ - PLUS(X8, VTMP2); - PLUS(X9, VTMP3); - PLUS(X10, X12_TMP); - PLUS(X11, X13_TMP); - PLUS(X14, VTMP0); - PLUS(X15, VTMP1); - - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); - - subs NBLKS, NBLKS, #4; - - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 - st1 {X1.16b,X5.16b}, [DST], #32; - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 - st1 {X10.16b,X14.16b}, [DST], #32; - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; - - b.ne L(loop4); - - ret_spec_stop -END (__chacha20_neon_blocks4) - -#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h deleted file mode 100644 index 37dbb917f1..0000000000 --- a/sysdeps/aarch64/chacha20_arch.h +++ /dev/null @@ -1,40 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <ldsodefs.h> -#include <stdbool.h> - -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); -#ifdef __AARCH64EL__ - __chacha20_neon_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - chacha20_crypt_generic (state, dst, src, bytes); -#endif -} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h deleted file mode 100644 index 1b4559ccbc..0000000000 --- a/sysdeps/generic/chacha20_arch.h +++ /dev/null @@ -1,24 +0,0 @@ -/* Chacha20 implementation, generic interface for encrypt. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index acceb9b67f..b5a42c70d6 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -20,6 +20,7 @@ # define NOT_CANCEL_H #include <fcntl.h> +#include <poll.h> #include <unistd.h> #include <sys/wait.h> #include <time.h> @@ -50,5 +51,7 @@ __fcntl64 (fd, cmd, __VA_ARGS__) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __poll_infinity_nocancel(fds, nfds) \ + __poll (fds, nfds, -1) #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h index a91915831b..d76c715a96 100644 --- a/sysdeps/generic/tls-internal-struct.h +++ b/sysdeps/generic/tls-internal-struct.h @@ -23,7 +23,6 @@ struct tls_internal_t { char *strsignal_buf; char *strerror_l_buf; - struct arc4random_state_t *rand_state; }; #endif diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 8a0f37d509..b32b31b5a9 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) { free (__tls_internal.strsignal_buf); free (__tls_internal.strerror_l_buf); - - if (__tls_internal.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (__tls_internal.rand_state, - sizeof (*__tls_internal.rand_state)); - free (__tls_internal.rand_state); - } } diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index 667068c8cf..e60b86fab1 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,8 +662,6 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); - call_function_static_weak (__arc4random_fork_subprocess); - /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 9a3a7ed59a..ae58b734e3 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -21,6 +21,7 @@ #include <fcntl.h> #include <unistd.h> +#include <poll.h> #include <sys/wait.h> #include <time.h> #include <sys/uio.h> @@ -77,6 +78,9 @@ __typeof (__fcntl) __fcntl_nocancel; #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __poll_infinity_nocancel(fds, nfds) \ + __poll (fds, nfds, -1) + #if IS_IN (libc) hidden_proto (__close_nocancel) hidden_proto (__close_nocancel_nostatus) diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index 7dc02569f6..dd568992e2 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,8 +43,6 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); - - call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile deleted file mode 100644 index 8c75165f7f..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c deleted file mode 100644 index cf9e735326..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c +++ /dev/null @@ -1 +0,0 @@ -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h deleted file mode 100644 index 08494dc045..0000000000 --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h +++ /dev/null @@ -1,42 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index abb0aa3f11..71a59529f3 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,8 +1,3 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif - -ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 -endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c deleted file mode 100644 index 0bbdcb9363..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c +++ /dev/null @@ -1,256 +0,0 @@ -/* Optimized PowerPC implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <altivec.h> -#include <endian.h> -#include <stddef.h> -#include <stdint.h> -#include <sys/cdefs.h> - -typedef vector unsigned char vector16x_u8; -typedef vector unsigned int vector4x_u32; -typedef vector unsigned long long vector2x_u64; - -#if __BYTE_ORDER == __BIG_ENDIAN -static const vector16x_u8 le_bswap_const = - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; -#endif - -static inline vector4x_u32 -vec_rol_elems (vector4x_u32 v, unsigned int idx) -{ -#if __BYTE_ORDER != __BIG_ENDIAN - return vec_sld (v, v, (16 - (4 * idx)) & 15); -#else - return vec_sld (v, v, (4 * idx) & 15); -#endif -} - -static inline vector4x_u32 -vec_load_le (unsigned long offset, const unsigned char *ptr) -{ - vector4x_u32 vec; - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - return vec; -} - -static inline void -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, - le_bswap_const); -#endif - vec_vsx_st (vec, offset, (uint32_t *)ptr); -} - - -static inline vector4x_u32 -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) -{ -#if __BYTE_ORDER == __BIG_ENDIAN - static const vector16x_u8 swap32 = - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; - vector2x_u64 vec, add, sum; - - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); - sum = vec + add; - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); -#else - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); -#endif -} - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE(v1,rolv) \ - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) - -#define PLUS(ds,s) \ - ((ds) += (s)) - -#define XOR(ds,s) \ - ((ds) ^= (s)) - -#define ADD_U64(v,a) \ - (v = vec_add_ctr_u64(v, a)) - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0, x1, x2, x3) ({ \ - vector4x_u32 t1 = vec_mergeh(x0, x2); \ - vector4x_u32 t2 = vec_mergel(x0, x2); \ - vector4x_u32 t3 = vec_mergeh(x1, x3); \ - x3 = vec_mergel(x1, x3); \ - x0 = vec_mergeh(t1, t3); \ - x1 = vec_mergel(t1, t3); \ - x2 = vec_mergeh(t2, x3); \ - x3 = vec_mergel(t2, x3); \ - }) - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); - -unsigned int attribute_hidden -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t nblks) -{ - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; - vector4x_u32 state0, state1, state2, state3; - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; - vector4x_u32 tmp; - int i; - - /* Force preload of constants to vector registers. */ - __asm__ ("": "+v" (counters_0123) :: "memory"); - __asm__ ("": "+v" (counter_4) :: "memory"); - __asm__ ("": "+v" (rotate_16) :: "memory"); - __asm__ ("": "+v" (rotate_12) :: "memory"); - __asm__ ("": "+v" (rotate_8) :: "memory"); - __asm__ ("": "+v" (rotate_7) :: "memory"); - - state0 = vec_vsx_ld (0 * 16, state); - state1 = vec_vsx_ld (1 * 16, state); - state2 = vec_vsx_ld (2 * 16, state); - state3 = vec_vsx_ld (3 * 16, state); - - do - { - v0 = vec_splat (state0, 0); - v1 = vec_splat (state0, 1); - v2 = vec_splat (state0, 2); - v3 = vec_splat (state0, 3); - v4 = vec_splat (state1, 0); - v5 = vec_splat (state1, 1); - v6 = vec_splat (state1, 2); - v7 = vec_splat (state1, 3); - v8 = vec_splat (state2, 0); - v9 = vec_splat (state2, 1); - v10 = vec_splat (state2, 2); - v11 = vec_splat (state2, 3); - v12 = vec_splat (state3, 0); - v13 = vec_splat (state3, 1); - v14 = vec_splat (state3, 2); - v15 = vec_splat (state3, 3); - - v12 += counters_0123; - v13 -= vec_cmplt (v12, counters_0123); - - for (i = 20; i > 0; i -= 2) - { - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) - } - - v0 += vec_splat (state0, 0); - v1 += vec_splat (state0, 1); - v2 += vec_splat (state0, 2); - v3 += vec_splat (state0, 3); - v4 += vec_splat (state1, 0); - v5 += vec_splat (state1, 1); - v6 += vec_splat (state1, 2); - v7 += vec_splat (state1, 3); - v8 += vec_splat (state2, 0); - v9 += vec_splat (state2, 1); - v10 += vec_splat (state2, 2); - v11 += vec_splat (state2, 3); - tmp = vec_splat( state3, 0); - tmp += counters_0123; - v12 += tmp; - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); - v14 += vec_splat (state3, 2); - v15 += vec_splat (state3, 3); - ADD_U64 (state3, counter_4); - - transpose_4x4 (v0, v1, v2, v3); - transpose_4x4 (v4, v5, v6, v7); - transpose_4x4 (v8, v9, v10, v11); - transpose_4x4 (v12, v13, v14, v15); - - vec_store_le (v0, (64 * 0 + 16 * 0), dst); - vec_store_le (v1, (64 * 1 + 16 * 0), dst); - vec_store_le (v2, (64 * 2 + 16 * 0), dst); - vec_store_le (v3, (64 * 3 + 16 * 0), dst); - - vec_store_le (v4, (64 * 0 + 16 * 1), dst); - vec_store_le (v5, (64 * 1 + 16 * 1), dst); - vec_store_le (v6, (64 * 2 + 16 * 1), dst); - vec_store_le (v7, (64 * 3 + 16 * 1), dst); - - vec_store_le (v8, (64 * 0 + 16 * 2), dst); - vec_store_le (v9, (64 * 1 + 16 * 2), dst); - vec_store_le (v10, (64 * 2 + 16 * 2), dst); - vec_store_le (v11, (64 * 3 + 16 * 2), dst); - - vec_store_le (v12, (64 * 0 + 16 * 3), dst); - vec_store_le (v13, (64 * 1 + 16 * 3), dst); - vec_store_le (v14, (64 * 2 + 16 * 3), dst); - vec_store_le (v15, (64 * 3 + 16 * 3), dst); - - src += 4*64; - dst += 4*64; - - nblks -= 4; - } - while (nblks); - - vec_vsx_st (state3, 3 * 16, state); - - return 0; -} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h deleted file mode 100644 index ded06762b6..0000000000 --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h +++ /dev/null @@ -1,37 +0,0 @@ -/* PowerPC optimization for ChaCha20. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> - -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static void -chacha20_crypt (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); - - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -} diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 96c110f490..66ed844e68 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf - -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-s390x \ - # sysdep_routines -endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S deleted file mode 100644 index e38504d370..0000000000 --- a/sysdeps/s390/s390-64/chacha20-s390x.S +++ /dev/null @@ -1,573 +0,0 @@ -/* Optimized s390x implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher - - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. - */ - -#include <sysdep.h> - -#ifdef HAVE_S390_VX_ASM_SUPPORT - -/* CFA expressions are used for pointing CFA and registers to - * SP relative offsets. */ -# define DW_REGNO_SP 15 - -/* Fixed length encoding used for integers for now. */ -# define DW_SLEB128_7BIT(value) \ - 0x00|((value) & 0x7f) -# define DW_SLEB128_28BIT(value) \ - 0x80|((value)&0x7f), \ - 0x80|(((value)>>7)&0x7f), \ - 0x80|(((value)>>14)&0x7f), \ - 0x00|(((value)>>21)&0x7f) - -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ - .cfi_escape \ - 0x0f, /* DW_CFA_def_cfa_expression */ \ - DW_SLEB128_7BIT(11), /* length */ \ - 0x7f, /* DW_OP_breg15, rsp + constant */ \ - DW_SLEB128_28BIT(rsp_offs), \ - 0x06, /* DW_OP_deref */ \ - 0x23, /* DW_OP_plus_constu */ \ - DW_SLEB128_28BIT((cfa_depth)+160) - -.machine "z13+vx" -.text - -.balign 16 -.Lconsts: -.Lwordswap: - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 -.Lbswap128: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap32: - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -.Lone: - .long 0, 0, 0, 1 -.Ladd_counter_0123: - .long 0, 1, 2, 3 -.Ladd_counter_4567: - .long 4, 5, 6, 7 - -/* register macros */ -#define INPUT %r2 -#define DST %r3 -#define SRC %r4 -#define NBLKS %r0 -#define ROUND %r1 - -/* stack structure */ - -#define STACK_FRAME_STD (8 * 16 + 8 * 4) -#define STACK_FRAME_F8_F15 (8 * 8) -#define STACK_FRAME_Y0_Y15 (16 * 16) -#define STACK_FRAME_CTR (4 * 16) -#define STACK_FRAME_PARAMS (6 * 8) - -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ - STACK_FRAME_PARAMS) - -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) -#define STACK_F9 (STACK_F8 + 8) -#define STACK_F10 (STACK_F9 + 8) -#define STACK_F11 (STACK_F10 + 8) -#define STACK_F12 (STACK_F11 + 8) -#define STACK_F13 (STACK_F12 + 8) -#define STACK_F14 (STACK_F13 + 8) -#define STACK_F15 (STACK_F14 + 8) -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) -#define STACK_DST (STACK_INPUT + 8) -#define STACK_SRC (STACK_DST + 8) -#define STACK_NBLKS (STACK_SRC + 8) -#define STACK_POCTX (STACK_NBLKS + 8) -#define STACK_POSRC (STACK_POCTX + 8) - -#define STACK_G0_H3 STACK_Y0_Y15 - -/* vector registers */ -#define A0 %v0 -#define A1 %v1 -#define A2 %v2 -#define A3 %v3 - -#define B0 %v4 -#define B1 %v5 -#define B2 %v6 -#define B3 %v7 - -#define C0 %v8 -#define C1 %v9 -#define C2 %v10 -#define C3 %v11 - -#define D0 %v12 -#define D1 %v13 -#define D2 %v14 -#define D3 %v15 - -#define E0 %v16 -#define E1 %v17 -#define E2 %v18 -#define E3 %v19 - -#define F0 %v20 -#define F1 %v21 -#define F2 %v22 -#define F3 %v23 - -#define G0 %v24 -#define G1 %v25 -#define G2 %v26 -#define G3 %v27 - -#define H0 %v28 -#define H1 %v29 -#define H2 %v30 -#define H3 %v31 - -#define IO0 E0 -#define IO1 E1 -#define IO2 E2 -#define IO3 E3 -#define IO4 F0 -#define IO5 F1 -#define IO6 F2 -#define IO7 F3 - -#define S0 G0 -#define S1 G1 -#define S2 G2 -#define S3 G3 - -#define TMP0 H0 -#define TMP1 H1 -#define TMP2 H2 -#define TMP3 H3 - -#define X0 A0 -#define X1 A1 -#define X2 A2 -#define X3 A3 -#define X4 B0 -#define X5 B1 -#define X6 B2 -#define X7 B3 -#define X8 C0 -#define X9 C1 -#define X10 C2 -#define X11 C3 -#define X12 D0 -#define X13 D1 -#define X14 D2 -#define X15 D3 - -#define Y0 E0 -#define Y1 E1 -#define Y2 E2 -#define Y3 E3 -#define Y4 F0 -#define Y5 F1 -#define Y6 F2 -#define Y7 F3 -#define Y8 G0 -#define Y9 G1 -#define Y10 G2 -#define Y11 G3 -#define Y12 H0 -#define Y13 H1 -#define Y14 H2 -#define Y15 H3 - -/********************************************************************** - helper macros - **********************************************************************/ - -#define _ /*_*/ - -#define START_STACK(last_r) \ - lgr %r0, %r15; \ - lghi %r1, ~15; \ - stmg %r6, last_r, 6 * 8(%r15); \ - aghi %r0, -STACK_MAX; \ - ngr %r0, %r1; \ - lgr %r1, %r15; \ - cfi_def_cfa_register(1); \ - lgr %r15, %r0; \ - stg %r1, 0(%r15); \ - cfi_cfa_on_stack(0, 0); \ - std %f8, STACK_F8(%r15); \ - std %f9, STACK_F9(%r15); \ - std %f10, STACK_F10(%r15); \ - std %f11, STACK_F11(%r15); \ - std %f12, STACK_F12(%r15); \ - std %f13, STACK_F13(%r15); \ - std %f14, STACK_F14(%r15); \ - std %f15, STACK_F15(%r15); - -#define END_STACK(last_r) \ - lg %r1, 0(%r15); \ - ld %f8, STACK_F8(%r15); \ - ld %f9, STACK_F9(%r15); \ - ld %f10, STACK_F10(%r15); \ - ld %f11, STACK_F11(%r15); \ - ld %f12, STACK_F12(%r15); \ - ld %f13, STACK_F13(%r15); \ - ld %f14, STACK_F14(%r15); \ - ld %f15, STACK_F15(%r15); \ - lmg %r6, last_r, 6 * 8(%r1); \ - lgr %r15, %r1; \ - cfi_def_cfa_register(DW_REGNO_SP); - -#define PLUS(dst,src) \ - vaf dst, dst, src; - -#define XOR(dst,src) \ - vx dst, dst, src; - -#define ROTATE(v1,c) \ - verllf v1, v1, (c)(0); - -#define WORD_ROTATE(v1,s) \ - vsldb v1, v1, v1, ((s) * 4); - -#define DST_8(OPER, I, J) \ - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); - -/********************************************************************** - round macros - **********************************************************************/ - -/********************************************************************** - 8-way chacha20 ("vertical") - **********************************************************************/ - -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - op1,op2,op3,op4,op5,op6,op7,op8,\ - op9,op10,op11,op12) \ - op1; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op2; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op3; \ - ROTATE(x3, 16); ROTATE(x7, 16); \ - ROTATE(x11, 16); ROTATE(x15, 16); \ - ROTATE(y3, 16); ROTATE(y7, 16); \ - ROTATE(y11, 16); ROTATE(y15, 16); \ - op4; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op5; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op6; \ - ROTATE(x1,12); ROTATE(x5,12); \ - ROTATE(x9,12); ROTATE(x13,12); \ - ROTATE(y1,12); ROTATE(y5,12); \ - ROTATE(y9,12); ROTATE(y13,12); \ - op7; \ - PLUS(x0, x1); PLUS(x4, x5); \ - PLUS(x8, x9); PLUS(x12, x13); \ - PLUS(y0, y1); PLUS(y4, y5); \ - PLUS(y8, y9); PLUS(y12, y13); \ - op8; \ - XOR(x3, x0); XOR(x7, x4); \ - XOR(x11, x8); XOR(x15, x12); \ - XOR(y3, y0); XOR(y7, y4); \ - XOR(y11, y8); XOR(y15, y12); \ - op9; \ - ROTATE(x3,8); ROTATE(x7,8); \ - ROTATE(x11,8); ROTATE(x15,8); \ - ROTATE(y3,8); ROTATE(y7,8); \ - ROTATE(y11,8); ROTATE(y15,8); \ - op10; \ - PLUS(x2, x3); PLUS(x6, x7); \ - PLUS(x10, x11); PLUS(x14, x15); \ - PLUS(y2, y3); PLUS(y6, y7); \ - PLUS(y10, y11); PLUS(y14, y15); \ - op11; \ - XOR(x1, x2); XOR(x5, x6); \ - XOR(x9, x10); XOR(x13, x14); \ - XOR(y1, y2); XOR(y5, y6); \ - XOR(y9, y10); XOR(y13, y14); \ - op12; \ - ROTATE(x1,7); ROTATE(x5,7); \ - ROTATE(x9,7); ROTATE(x13,7); \ - ROTATE(y1,7); ROTATE(y5,7); \ - ROTATE(y9,7); ROTATE(y13,7); - -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ - x8,x9,x10,x11,x12,x13,x14,x15,\ - y0,y1,y2,y3,y4,y5,y6,y7,\ - y8,y9,y10,y11,y12,y13,y14,y15,\ - ,,,,,,,,,,,) - -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ - vmrhf tmp0, v0, v1; \ - vmrhf tmp1, v2, v3; \ - vmrlf tmp2, v0, v1; \ - vmrlf v3, v2, v3; \ - vmrhf tmpa, va, vb; \ - vmrhf tmpb, vc, vd; \ - vmrlf tmpc, va, vb; \ - vmrlf vd, vc, vd; \ - vpdi v0, tmp0, tmp1, 0; \ - vpdi v1, tmp0, tmp1, 5; \ - vpdi v2, tmp2, v3, 0; \ - vpdi v3, tmp2, v3, 5; \ - vpdi va, tmpa, tmpb, 0; \ - vpdi vb, tmpa, tmpb, 5; \ - vpdi vc, tmpc, vd, 0; \ - vpdi vd, tmpc, vd, 5; - -.balign 8 -.globl __chacha20_s390x_vx_blocks8 -ENTRY (__chacha20_s390x_vx_blocks8) - /* input: - * %r2: input - * %r3: dst - * %r4: src - * %r5: nblks (multiple of 8) - */ - - START_STACK(%r8); - lgr NBLKS, %r5; - - larl %r7, .Lconsts; - - /* Load counter. */ - lg %r8, (12 * 4)(INPUT); - rllg %r8, %r8, 32; - -.balign 4 - /* Process eight chacha20 blocks per loop. */ -.Lloop8: - vlm Y0, Y3, 0(INPUT); - - slgfi NBLKS, 8; - lghi ROUND, (20 / 2); - - /* Construct counter vectors X12/X13 & Y12/Y13. */ - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); - vrepf Y12, Y3, 0; - vrepf Y13, Y3, 1; - vaccf X5, Y12, X4; - vaccf Y5, Y12, Y4; - vaf X12, Y12, X4; - vaf Y12, Y12, Y4; - vaf X13, Y13, X5; - vaf Y13, Y13, Y5; - - vrepf X0, Y0, 0; - vrepf X1, Y0, 1; - vrepf X2, Y0, 2; - vrepf X3, Y0, 3; - vrepf X4, Y1, 0; - vrepf X5, Y1, 1; - vrepf X6, Y1, 2; - vrepf X7, Y1, 3; - vrepf X8, Y2, 0; - vrepf X9, Y2, 1; - vrepf X10, Y2, 2; - vrepf X11, Y2, 3; - vrepf X14, Y3, 2; - vrepf X15, Y3, 3; - - /* Store counters for blocks 0-7. */ - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); - - vlr Y0, X0; - vlr Y1, X1; - vlr Y2, X2; - vlr Y3, X3; - vlr Y4, X4; - vlr Y5, X5; - vlr Y6, X6; - vlr Y7, X7; - vlr Y8, X8; - vlr Y9, X9; - vlr Y10, X10; - vlr Y11, X11; - vlr Y14, X14; - vlr Y15, X15; - - /* Update and store counter. */ - agfi %r8, 8; - rllg %r5, %r8, 32; - stg %r5, (12 * 4)(INPUT); - -.balign 4 -.Lround2_8: - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, - X2, X6, X10, X14, X3, X7, X11, X15, - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, - X2, X7, X8, X13, X3, X4, X9, X14, - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); - brctg ROUND, .Lround2_8; - - /* Store blocks 4-7. */ - vstm Y0, Y15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 0-3. */ - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); - - lghi ROUND, 1; - j .Lfirst_output_4blks_8; - -.balign 4 -.Lsecond_output_4blks_8: - /* Load blocks 4-7. */ - vlm X0, X15, STACK_Y0_Y15(%r15); - - /* Load counters for blocks 4-7. */ - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); - - lghi ROUND, 0; - -.balign 4 - /* Output four chacha20 blocks per loop. */ -.Lfirst_output_4blks_8: - vlm Y12, Y15, 0(INPUT); - PLUS(X12, Y0); - PLUS(X13, Y1); - vrepf Y0, Y12, 0; - vrepf Y1, Y12, 1; - vrepf Y2, Y12, 2; - vrepf Y3, Y12, 3; - vrepf Y4, Y13, 0; - vrepf Y5, Y13, 1; - vrepf Y6, Y13, 2; - vrepf Y7, Y13, 3; - vrepf Y8, Y14, 0; - vrepf Y9, Y14, 1; - vrepf Y10, Y14, 2; - vrepf Y11, Y14, 3; - vrepf Y14, Y15, 2; - vrepf Y15, Y15, 3; - PLUS(X0, Y0); - PLUS(X1, Y1); - PLUS(X2, Y2); - PLUS(X3, Y3); - PLUS(X4, Y4); - PLUS(X5, Y5); - PLUS(X6, Y6); - PLUS(X7, Y7); - PLUS(X8, Y8); - PLUS(X9, Y9); - PLUS(X10, Y10); - PLUS(X11, Y11); - PLUS(X14, Y14); - PLUS(X15, Y15); - - vl Y15, (.Lbswap32 - .Lconsts)(%r7); - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, - Y9, Y10, Y11, Y12, Y13, Y14); - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, - Y9, Y10, Y11, Y12, Y13, Y14); - - vlm Y0, Y14, 0(SRC); - vperm X0, X0, X0, Y15; - vperm X1, X1, X1, Y15; - vperm X2, X2, X2, Y15; - vperm X3, X3, X3, Y15; - vperm X4, X4, X4, Y15; - vperm X5, X5, X5, Y15; - vperm X6, X6, X6, Y15; - vperm X7, X7, X7, Y15; - vperm X8, X8, X8, Y15; - vperm X9, X9, X9, Y15; - vperm X10, X10, X10, Y15; - vperm X11, X11, X11, Y15; - vperm X12, X12, X12, Y15; - vperm X13, X13, X13, Y15; - vperm X14, X14, X14, Y15; - vperm X15, X15, X15, Y15; - vl Y15, (15 * 16)(SRC); - - XOR(Y0, X0); - XOR(Y1, X4); - XOR(Y2, X8); - XOR(Y3, X12); - XOR(Y4, X1); - XOR(Y5, X5); - XOR(Y6, X9); - XOR(Y7, X13); - XOR(Y8, X2); - XOR(Y9, X6); - XOR(Y10, X10); - XOR(Y11, X14); - XOR(Y12, X3); - XOR(Y13, X7); - XOR(Y14, X11); - XOR(Y15, X15); - vstm Y0, Y15, 0(DST); - - aghi SRC, 256; - aghi DST, 256; - - clgije ROUND, 1, .Lsecond_output_4blks_8; - - clgijhe NBLKS, 8, .Lloop8; - - - END_STACK(%r8); - xgr %r2, %r2; - br %r14; -END (__chacha20_s390x_vx_blocks8) - -#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h deleted file mode 100644 index 0c6abf77e8..0000000000 --- a/sysdeps/s390/s390-64/chacha20_arch.h +++ /dev/null @@ -1,45 +0,0 @@ -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <stdbool.h> -#include <ldsodefs.h> -#include <sys/auxv.h> - -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ -#ifdef HAVE_S390_VX_ASM_SUPPORT - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - - if (GLRO(dl_hwcap) & HWCAP_S390_VX) - { - __chacha20_s390x_vx_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - return; - } -#endif - chacha20_crypt_generic (state, dst, src, bytes); -} diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2c58d5ae2f..a263d294b1 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -23,6 +23,7 @@ #include <sysdep.h> #include <errno.h> #include <unistd.h> +#include <sys/poll.h> #include <sys/syscall.h> #include <sys/wait.h> #include <time.h> @@ -70,9 +71,14 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) static inline int __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) { - return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); + return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags); } +static inline int +__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) +{ + return INLINE_SYSCALL_CALL (ppoll, fds, nfds, NULL, NULL, 0); +} /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 0326ebb767..c8a9ed2d40 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -16,7 +16,6 @@ License along with the GNU C Library; if not, see <https://www.gnu.org/licenses/>. */ -#include <stdlib/arc4random.h> #include <string.h> #include <tls-internal.h> @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) struct pthread *self = THREAD_SELF; free (self->tls_state.strsignal_buf); free (self->tls_state.strerror_l_buf); - - if (self->tls_state.rand_state != NULL) - { - /* Clear any lingering random state prior so if the thread stack is - cached it won't leak any data. */ - explicit_bzero (self->tls_state.rand_state, - sizeof (*self->tls_state.rand_state)); - free (self->tls_state.rand_state); - } } diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h index ebc65d896a..2ebe977802 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.h +++ b/sysdeps/unix/sysv/linux/tls-internal.h @@ -28,7 +28,6 @@ __glibc_tls_internal (void) return &THREAD_SELF->tls_state; } -/* Reset the arc4random TCB state on fork. */ extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 1178475d75..c19bef2dec 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif -ifeq ($(subdir),stdlib) -sysdep_routines += \ - chacha20-amd64-sse2 \ - chacha20-amd64-avx2 \ - # sysdep_routines -endif - ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S deleted file mode 100644 index aefd1cdbd0..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-avx2.S +++ /dev/null @@ -1,328 +0,0 @@ -/* Optimized AVX2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (32) -#define STACK_VEC_X13 (32 + STACK_VEC_X12) -#define STACK_TMP (32 + STACK_VEC_X13) -#define STACK_TMP1 (32 + STACK_TMP) - -#define STACK_MAX (32 + STACK_TMP1) - -/* vector registers */ -#define X0 %ymm0 -#define X1 %ymm1 -#define X2 %ymm2 -#define X3 %ymm3 -#define X4 %ymm4 -#define X5 %ymm5 -#define X6 %ymm6 -#define X7 %ymm7 -#define X8 %ymm8 -#define X9 %ymm9 -#define X10 %ymm10 -#define X11 %ymm11 -#define X12 %ymm12 -#define X13 %ymm13 -#define X14 %ymm14 -#define X15 %ymm15 - -#define X0h %xmm0 -#define X1h %xmm1 -#define X2h %xmm2 -#define X3h %xmm3 -#define X4h %xmm4 -#define X5h %xmm5 -#define X6h %xmm6 -#define X7h %xmm7 -#define X8h %xmm8 -#define X9h %xmm9 -#define X10h %xmm10 -#define X11h %xmm11 -#define X12h %xmm12 -#define X13h %xmm13 -#define X14h %xmm14 -#define X15h %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ - vpunpckhdq x1, x0, t2; \ - vpunpckldq x1, x0, x0; \ - \ - vpunpckldq x3, x2, t1; \ - vpunpckhdq x3, x2, x2; \ - \ - vpunpckhqdq t1, x0, x1; \ - vpunpcklqdq t1, x0, x0; \ - \ - vpunpckhqdq x2, t2, x3; \ - vpunpcklqdq x2, t2, x2; - -/* 2x2 128-bit matrix transpose */ -#define transpose_16byte_2x2(x0,x1,t1) \ - vmovdqa x0, t1; \ - vperm2i128 $0x20, x1, x0, x0; \ - vperm2i128 $0x31, x1, t1, x1; - -/********************************************************************** - 8-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp) \ - vpsrld $(32 - (c)), v1, tmp; \ - vpslld $(c), v1, v1; \ - vpaddb tmp, v1, v1; \ - vpsrld $(32 - (c)), v2, tmp; \ - vpslld $(c), v2, v2; \ - vpaddb tmp, v2, v2; - -#define ROTATE_SHUF_2(v1,v2,shuf) \ - vpshufb shuf, v1, v1; \ - vpshufb shuf, v2, v2; - -#define XOR(ds,s) \ - vpxor s, ds, ds; - -#define PLUS(ds,s) \ - vpaddd s, ds, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ - interleave_op1,interleave_op2,\ - interleave_op3,interleave_op4) \ - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ - interleave_op1; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op2; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1); \ - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ - interleave_op3; \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE_SHUF_2(d1, d2, tmp1); \ - interleave_op4; \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1); - - .section .text.avx2, "ax", @progbits - .align 32 -chacha20_data: -L(shuf_rol16): - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 -L(shuf_rol8): - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 -L(inc_counter): - .byte 0,1,2,3,4,5,6,7 -L(unsigned_cmp): - .long 0x80000000 - - .hidden __chacha20_avx2_blocks8 -ENTRY (__chacha20_avx2_blocks8) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 8) - */ - vzeroupper; - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(rbp); - - subq $STACK_MAX, %rsp; - andq $~31, %rsp; - -L(loop8): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - vpmovzxbd L(inc_counter) rRIP, X0; - vpbroadcastd L(unsigned_cmp) rRIP, X2; - vpbroadcastd (12 * 4)(INPUT), X12; - vpbroadcastd (13 * 4)(INPUT), X13; - vpaddd X0, X12, X12; - vpxor X2, X0, X0; - vpxor X2, X12, X1; - vpcmpgtd X1, X0, X0; - vpsubd X0, X13, X13; - vmovdqa X12, (STACK_VEC_X12)(%rsp); - vmovdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - vpbroadcastd (0 * 4)(INPUT), X0; - vpbroadcastd (1 * 4)(INPUT), X1; - vpbroadcastd (2 * 4)(INPUT), X2; - vpbroadcastd (3 * 4)(INPUT), X3; - vpbroadcastd (4 * 4)(INPUT), X4; - vpbroadcastd (5 * 4)(INPUT), X5; - vpbroadcastd (6 * 4)(INPUT), X6; - vpbroadcastd (7 * 4)(INPUT), X7; - vpbroadcastd (8 * 4)(INPUT), X8; - vpbroadcastd (9 * 4)(INPUT), X9; - vpbroadcastd (10 * 4)(INPUT), X10; - vpbroadcastd (11 * 4)(INPUT), X11; - vpbroadcastd (14 * 4)(INPUT), X14; - vpbroadcastd (15 * 4)(INPUT), X15; - vmovdqa X15, (STACK_TMP)(%rsp); - -L(round2): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqa X8, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) - vmovdqa (STACK_TMP)(%rsp), X8; - vmovdqa X15, (STACK_TMP)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) - sub $2, ROUND; - jnz L(round2); - - vmovdqa X8, (STACK_TMP1)(%rsp); - - /* tmp := X15 */ - vpbroadcastd (0 * 4)(INPUT), X15; - PLUS(X0, X15); - vpbroadcastd (1 * 4)(INPUT), X15; - PLUS(X1, X15); - vpbroadcastd (2 * 4)(INPUT), X15; - PLUS(X2, X15); - vpbroadcastd (3 * 4)(INPUT), X15; - PLUS(X3, X15); - vpbroadcastd (4 * 4)(INPUT), X15; - PLUS(X4, X15); - vpbroadcastd (5 * 4)(INPUT), X15; - PLUS(X5, X15); - vpbroadcastd (6 * 4)(INPUT), X15; - PLUS(X6, X15); - vpbroadcastd (7 * 4)(INPUT), X15; - PLUS(X7, X15); - transpose_4x4(X0, X1, X2, X3, X8, X15); - transpose_4x4(X4, X5, X6, X7, X8, X15); - vmovdqa (STACK_TMP1)(%rsp), X8; - transpose_16byte_2x2(X0, X4, X15); - transpose_16byte_2x2(X1, X5, X15); - transpose_16byte_2x2(X2, X6, X15); - transpose_16byte_2x2(X3, X7, X15); - vmovdqa (STACK_TMP)(%rsp), X15; - vmovdqu X0, (64 * 0 + 16 * 0)(DST) - vmovdqu X1, (64 * 1 + 16 * 0)(DST) - vpbroadcastd (8 * 4)(INPUT), X0; - PLUS(X8, X0); - vpbroadcastd (9 * 4)(INPUT), X0; - PLUS(X9, X0); - vpbroadcastd (10 * 4)(INPUT), X0; - PLUS(X10, X0); - vpbroadcastd (11 * 4)(INPUT), X0; - PLUS(X11, X0); - vmovdqa (STACK_VEC_X12)(%rsp), X0; - PLUS(X12, X0); - vmovdqa (STACK_VEC_X13)(%rsp), X0; - PLUS(X13, X0); - vpbroadcastd (14 * 4)(INPUT), X0; - PLUS(X14, X0); - vpbroadcastd (15 * 4)(INPUT), X0; - PLUS(X15, X0); - vmovdqu X2, (64 * 2 + 16 * 0)(DST) - vmovdqu X3, (64 * 3 + 16 * 0)(DST) - - /* Update counter */ - addq $8, (12 * 4)(INPUT); - - transpose_4x4(X8, X9, X10, X11, X0, X1); - transpose_4x4(X12, X13, X14, X15, X0, X1); - vmovdqu X4, (64 * 4 + 16 * 0)(DST) - vmovdqu X5, (64 * 5 + 16 * 0)(DST) - transpose_16byte_2x2(X8, X12, X0); - transpose_16byte_2x2(X9, X13, X0); - transpose_16byte_2x2(X10, X14, X0); - transpose_16byte_2x2(X11, X15, X0); - vmovdqu X6, (64 * 6 + 16 * 0)(DST) - vmovdqu X7, (64 * 7 + 16 * 0)(DST) - vmovdqu X8, (64 * 0 + 16 * 2)(DST) - vmovdqu X9, (64 * 1 + 16 * 2)(DST) - vmovdqu X10, (64 * 2 + 16 * 2)(DST) - vmovdqu X11, (64 * 3 + 16 * 2)(DST) - vmovdqu X12, (64 * 4 + 16 * 2)(DST) - vmovdqu X13, (64 * 5 + 16 * 2)(DST) - vmovdqu X14, (64 * 6 + 16 * 2)(DST) - vmovdqu X15, (64 * 7 + 16 * 2)(DST) - - sub $8, NBLKS; - lea (8 * 64)(DST), DST; - lea (8 * 64)(SRC), SRC; - jnz L(loop8); - - vzeroupper; - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret; - int3; -END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S deleted file mode 100644 index 351a1109c6..0000000000 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ /dev/null @@ -1,311 +0,0 @@ -/* Optimized SSE2 implementation of ChaCha20 cipher. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher - - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> - - This file is part of Libgcrypt. - - Libgcrypt is free software; you can redistribute it and/or modify - it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of - the License, or (at your option) any later version. - - Libgcrypt is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with this program; if not, see <https://www.gnu.org/licenses/>. -*/ - -/* Based on D. J. Bernstein reference implementation at - http://cr.yp.to/chacha.html: - - chacha-regs.c version 20080118 - D. J. Bernstein - Public domain. */ - -#include <sysdep.h> -#include <isa-level.h> - -#if MINIMUM_X86_ISA_LEVEL <= 2 - -#ifdef PIC -# define rRIP (%rip) -#else -# define rRIP -#endif - -/* 'ret' instruction replacement for straight-line speculation mitigation */ -#define ret_spec_stop \ - ret; int3; - -/* register macros */ -#define INPUT %rdi -#define DST %rsi -#define SRC %rdx -#define NBLKS %rcx -#define ROUND %eax - -/* stack structure */ -#define STACK_VEC_X12 (16) -#define STACK_VEC_X13 (16 + STACK_VEC_X12) -#define STACK_TMP (16 + STACK_VEC_X13) -#define STACK_TMP1 (16 + STACK_TMP) -#define STACK_TMP2 (16 + STACK_TMP1) - -#define STACK_MAX (16 + STACK_TMP2) - -/* vector registers */ -#define X0 %xmm0 -#define X1 %xmm1 -#define X2 %xmm2 -#define X3 %xmm3 -#define X4 %xmm4 -#define X5 %xmm5 -#define X6 %xmm6 -#define X7 %xmm7 -#define X8 %xmm8 -#define X9 %xmm9 -#define X10 %xmm10 -#define X11 %xmm11 -#define X12 %xmm12 -#define X13 %xmm13 -#define X14 %xmm14 -#define X15 %xmm15 - -/********************************************************************** - helper macros - **********************************************************************/ - -/* 4x4 32-bit integer matrix transpose */ -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ - movdqa x0, t2; \ - punpckhdq x1, t2; \ - punpckldq x1, x0; \ - \ - movdqa x2, t1; \ - punpckldq x3, t1; \ - punpckhdq x3, x2; \ - \ - movdqa x0, x1; \ - punpckhqdq t1, x1; \ - punpcklqdq t1, x0; \ - \ - movdqa t2, x3; \ - punpckhqdq x2, x3; \ - punpcklqdq x2, t2; \ - movdqa t2, x2; - -/* fill xmm register with 32-bit value from memory */ -#define PBROADCASTD(mem32, xreg) \ - movd mem32, xreg; \ - pshufd $0, xreg, xreg; - -/********************************************************************** - 4-way chacha20 - **********************************************************************/ - -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ - movdqa v1, tmp1; \ - movdqa v2, tmp2; \ - psrld $(32 - (c)), v1; \ - pslld $(c), tmp1; \ - paddb tmp1, v1; \ - psrld $(32 - (c)), v2; \ - pslld $(c), tmp2; \ - paddb tmp2, v2; - -#define XOR(ds,s) \ - pxor s, ds; - -#define PLUS(ds,s) \ - paddd s, ds; - -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 16, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 12, tmp1, tmp2); \ - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ - ROTATE2(d1, d2, 8, tmp1, tmp2); \ - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ - ROTATE2(b1, b2, 7, tmp1, tmp2); - - .section .text.sse2,"ax",@progbits - -chacha20_data: - .align 16 -L(counter1): - .long 1,0,0,0 -L(inc_counter): - .long 0,1,2,3 -L(unsigned_cmp): - .long 0x80000000,0x80000000,0x80000000,0x80000000 - - .hidden __chacha20_sse2_blocks4 -ENTRY (__chacha20_sse2_blocks4) - /* input: - * %rdi: input - * %rsi: dst - * %rdx: src - * %rcx: nblks (multiple of 4) - */ - - pushq %rbp; - cfi_adjust_cfa_offset(8); - cfi_rel_offset(rbp, 0) - movq %rsp, %rbp; - cfi_def_cfa_register(%rbp); - - subq $STACK_MAX, %rsp; - andq $~15, %rsp; - -L(loop4): - mov $20, ROUND; - - /* Construct counter vectors X12 and X13 */ - movdqa L(inc_counter) rRIP, X0; - movdqa L(unsigned_cmp) rRIP, X2; - PBROADCASTD((12 * 4)(INPUT), X12); - PBROADCASTD((13 * 4)(INPUT), X13); - paddd X0, X12; - movdqa X12, X1; - pxor X2, X0; - pxor X2, X1; - pcmpgtd X1, X0; - psubd X0, X13; - movdqa X12, (STACK_VEC_X12)(%rsp); - movdqa X13, (STACK_VEC_X13)(%rsp); - - /* Load vectors */ - PBROADCASTD((0 * 4)(INPUT), X0); - PBROADCASTD((1 * 4)(INPUT), X1); - PBROADCASTD((2 * 4)(INPUT), X2); - PBROADCASTD((3 * 4)(INPUT), X3); - PBROADCASTD((4 * 4)(INPUT), X4); - PBROADCASTD((5 * 4)(INPUT), X5); - PBROADCASTD((6 * 4)(INPUT), X6); - PBROADCASTD((7 * 4)(INPUT), X7); - PBROADCASTD((8 * 4)(INPUT), X8); - PBROADCASTD((9 * 4)(INPUT), X9); - PBROADCASTD((10 * 4)(INPUT), X10); - PBROADCASTD((11 * 4)(INPUT), X11); - PBROADCASTD((14 * 4)(INPUT), X14); - PBROADCASTD((15 * 4)(INPUT), X15); - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - -L(round2_4): - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) - movdqa (STACK_TMP)(%rsp), X11; - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X8, (STACK_TMP)(%rsp); - movdqa X9, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) - movdqa (STACK_TMP)(%rsp), X8; - movdqa (STACK_TMP1)(%rsp), X9; - movdqa X11, (STACK_TMP)(%rsp); - movdqa X15, (STACK_TMP1)(%rsp); - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) - sub $2, ROUND; - jnz L(round2_4); - - /* tmp := X15 */ - movdqa (STACK_TMP)(%rsp), X11; - PBROADCASTD((0 * 4)(INPUT), X15); - PLUS(X0, X15); - PBROADCASTD((1 * 4)(INPUT), X15); - PLUS(X1, X15); - PBROADCASTD((2 * 4)(INPUT), X15); - PLUS(X2, X15); - PBROADCASTD((3 * 4)(INPUT), X15); - PLUS(X3, X15); - PBROADCASTD((4 * 4)(INPUT), X15); - PLUS(X4, X15); - PBROADCASTD((5 * 4)(INPUT), X15); - PLUS(X5, X15); - PBROADCASTD((6 * 4)(INPUT), X15); - PLUS(X6, X15); - PBROADCASTD((7 * 4)(INPUT), X15); - PLUS(X7, X15); - PBROADCASTD((8 * 4)(INPUT), X15); - PLUS(X8, X15); - PBROADCASTD((9 * 4)(INPUT), X15); - PLUS(X9, X15); - PBROADCASTD((10 * 4)(INPUT), X15); - PLUS(X10, X15); - PBROADCASTD((11 * 4)(INPUT), X15); - PLUS(X11, X15); - movdqa (STACK_VEC_X12)(%rsp), X15; - PLUS(X12, X15); - movdqa (STACK_VEC_X13)(%rsp), X15; - PLUS(X13, X15); - movdqa X13, (STACK_TMP)(%rsp); - PBROADCASTD((14 * 4)(INPUT), X15); - PLUS(X14, X15); - movdqa (STACK_TMP1)(%rsp), X15; - movdqa X14, (STACK_TMP1)(%rsp); - PBROADCASTD((15 * 4)(INPUT), X13); - PLUS(X15, X13); - movdqa X15, (STACK_TMP2)(%rsp); - - /* Update counter */ - addq $4, (12 * 4)(INPUT); - - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); - movdqu X0, (64 * 0 + 16 * 0)(DST) - movdqu X1, (64 * 1 + 16 * 0)(DST) - movdqu X2, (64 * 2 + 16 * 0)(DST) - movdqu X3, (64 * 3 + 16 * 0)(DST) - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); - movdqa (STACK_TMP)(%rsp), X13; - movdqa (STACK_TMP1)(%rsp), X14; - movdqa (STACK_TMP2)(%rsp), X15; - movdqu X4, (64 * 0 + 16 * 1)(DST) - movdqu X5, (64 * 1 + 16 * 1)(DST) - movdqu X6, (64 * 2 + 16 * 1)(DST) - movdqu X7, (64 * 3 + 16 * 1)(DST) - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); - movdqu X8, (64 * 0 + 16 * 2)(DST) - movdqu X9, (64 * 1 + 16 * 2)(DST) - movdqu X10, (64 * 2 + 16 * 2)(DST) - movdqu X11, (64 * 3 + 16 * 2)(DST) - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); - movdqu X12, (64 * 0 + 16 * 3)(DST) - movdqu X13, (64 * 1 + 16 * 3)(DST) - movdqu X14, (64 * 2 + 16 * 3)(DST) - movdqu X15, (64 * 3 + 16 * 3)(DST) - - sub $4, NBLKS; - lea (4 * 64)(DST), DST; - lea (4 * 64)(SRC), SRC; - jnz L(loop4); - - /* eax zeroed by round loop. */ - leave; - cfi_adjust_cfa_offset(-8) - cfi_def_cfa_register(%rsp); - ret_spec_stop; -END (__chacha20_sse2_blocks4) - -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h deleted file mode 100644 index 6f3784e392..0000000000 --- a/sysdeps/x86_64/chacha20_arch.h +++ /dev/null @@ -1,55 +0,0 @@ -/* Chacha20 implementation, used on arc4random. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <isa-level.h> -#include <ldsodefs.h> -#include <cpu-features.h> -#include <sys/param.h> - -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, - const uint8_t *src, size_t nblks) - attribute_hidden; - -static inline void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) -{ - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, - "CHACHA20_BUFSIZE not multiple of 4 or 8"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - -#if MINIMUM_X86_ISA_LEVEL > 2 - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else - const struct cpu_features* cpu_features = __get_cpu_features (); - - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) - __chacha20_avx2_blocks8 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); - else - __chacha20_sse2_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#endif -} -- 2.35.1 ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v6] arc4random: simplify design for better safety 2022-07-26 19:58 ` [PATCH v6] " Jason A. Donenfeld @ 2022-07-26 20:17 ` Adhemerval Zanella Netto 2022-07-26 20:56 ` Adhemerval Zanella Netto 2022-07-28 10:29 ` Szabolcs Nagy 1 sibling, 1 reply; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 20:17 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto On 26/07/22 16:58, Jason A. Donenfeld wrote: > Rather than buffering 16 MiB of entropy in userspace (by way of > chacha20), simply call getrandom() every time. > > This approach is doubtlessly slower, for now, but trying to prematurely > optimize arc4random appears to be leading toward all sorts of nasty > properties and gotchas. Instead, this patch takes a much more > conservative approach. The interface is added as a basic loop wrapper > around getrandom(), and then later, the kernel and libc together can > work together on optimizing that. > > This prevents numerous issues in which userspace is unaware of when it > really must throw away its buffer, since we avoid buffering all > together. Future improvements may include userspace learning more from > the kernel about when to do that, which might make these sorts of > chacha20-based optimizations more possible. The current heuristic of 16 > MiB is meaningless garbage that doesn't correspond to anything the > kernel might know about. So for now, let's just do something > conservative that we know is correct and won't lead to cryptographic > issues for users of this function. > > This patch might be considered along the lines of, "optimization is the > root of all evil," in that the much more complex implementation it > replaces moves too fast without considering security implications, > whereas the incremental approach done here is a much safer way of going > about things. Once this lands, we can take our time in optimizing this > properly using new interplay between the kernel and userspace. > > getrandom(0) is used, since that's the one that ensures the bytes > returned are cryptographically secure. But on systems without it, we > fallback to using /dev/urandom. This is unfortunate because it means > opening a file descriptor, but there's not much of a choice. Secondly, > as part of the fallback, in order to get more or less the same > properties of getrandom(0), we poll on /dev/random, and if the poll > succeeds at least once, then we assume the RNG is initialized. This is a > rough approximation, as the ancient "non-blocking pool" initialized > after the "blocking pool", not before, and it may not port back to all > ancient kernels, though it does to all kernels supported by glibc > (≥3.2), so generally it's the best approximation we can do. > > The motivation for including arc4random, in the first place, is to have > source-level compatibility with existing code. That means this patch > doesn't attempt to litigate the interface itself. It does, however, > choose a conservative approach for implementing it. LGTM, I agree this is safe solution for 2.36, we can optimize it later if is were the case. I will run some tests and push it upstream. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Cristian Rodríguez <crrodriguez@opensuse.org> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: Mark Harris <mark.hsj@gmail.com> > Cc: Eric Biggers <ebiggers@kernel.org> > Cc: linux-crypto@vger.kernel.org > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > LICENSES | 23 - > NEWS | 4 +- > include/stdlib.h | 3 - > manual/math.texi | 13 +- > stdlib/Makefile | 2 - > stdlib/arc4random.c | 196 ++---- > stdlib/arc4random.h | 48 -- > stdlib/chacha20.c | 191 ------ > stdlib/tst-arc4random-chacha20.c | 167 ----- > sysdeps/aarch64/Makefile | 4 - > sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- > sysdeps/aarch64/chacha20_arch.h | 40 -- > sysdeps/generic/chacha20_arch.h | 24 - > sysdeps/generic/not-cancel.h | 3 + > sysdeps/generic/tls-internal-struct.h | 1 - > sysdeps/generic/tls-internal.c | 10 - > sysdeps/mach/hurd/_Fork.c | 2 - > sysdeps/mach/hurd/not-cancel.h | 4 + > sysdeps/nptl/_Fork.c | 2 - > .../powerpc/powerpc64/be/multiarch/Makefile | 4 - > .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - > .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- > sysdeps/powerpc/powerpc64/power8/Makefile | 5 - > .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- > .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- > sysdeps/s390/s390-64/Makefile | 6 - > sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ > sysdeps/s390/s390-64/chacha20_arch.h | 45 -- > sysdeps/unix/sysv/linux/not-cancel.h | 8 +- > sysdeps/unix/sysv/linux/tls-internal.c | 10 - > sysdeps/unix/sysv/linux/tls-internal.h | 1 - > sysdeps/x86_64/Makefile | 7 - > sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- > sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- > sysdeps/x86_64/chacha20_arch.h | 55 -- > 35 files changed, 64 insertions(+), 2676 deletions(-) > delete mode 100644 stdlib/arc4random.h > delete mode 100644 stdlib/chacha20.c > delete mode 100644 stdlib/tst-arc4random-chacha20.c > delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S > delete mode 100644 sysdeps/aarch64/chacha20_arch.h > delete mode 100644 sysdeps/generic/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S > delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h > delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S > delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S > delete mode 100644 sysdeps/x86_64/chacha20_arch.h > > diff --git a/LICENSES b/LICENSES > index cd04fb6e84..530893b1dc 100644 > --- a/LICENSES > +++ b/LICENSES > @@ -389,26 +389,3 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov> > You should have received a copy of the GNU Lesser General Public > License along with this library; if not, see > <https://www.gnu.org/licenses/>. */ > -\f > -sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, > -sysdeps/x86_64/chacha20-amd64-avx2.S, and > -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and > -sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, > -with the following notices: > - > -Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > -This file is part of Libgcrypt. > - > -Libgcrypt is free software; you can redistribute it and/or modify > -it under the terms of the GNU Lesser General Public License as > -published by the Free Software Foundation; either version 2.1 of > -the License, or (at your option) any later version. > - > -Libgcrypt is distributed in the hope that it will be useful, > -but WITHOUT ANY WARRANTY; without even the implied warranty of > -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > -GNU Lesser General Public License for more details. > - > -You should have received a copy of the GNU Lesser General Public > -License along with this program; if not, see <https://www.gnu.org/licenses/>. Ok. > diff --git a/NEWS b/NEWS > index 8420a65cd0..fe531bfe1e 100644 > --- a/NEWS > +++ b/NEWS > @@ -61,8 +61,8 @@ Major new features: > is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). > > * The functions arc4random, arc4random_buf, and arc4random_uniform have been > - added. The functions use a pseudo-random number generator along with > - entropy from the kernel. > + added. The functions wrap getrandom and/or /dev/urandom to return high- > + quality randomness from the kernel. > > Deprecated and removed features, and other changes affecting compatibility: > > diff --git a/include/stdlib.h b/include/stdlib.h > index cae7f7cdf8..db51f4a4f6 100644 > --- a/include/stdlib.h > +++ b/include/stdlib.h > @@ -152,9 +152,6 @@ __typeof (arc4random_uniform) __arc4random_uniform; > libc_hidden_proto (__arc4random_uniform); > extern void __arc4random_buf_internal (void *buffer, size_t len) > attribute_hidden; > -/* Called from the fork function to reinitialize the internal cipher state > - in child process. */ > -extern void __arc4random_fork_subprocess (void) attribute_hidden; > > extern double __strtod_internal (const char *__restrict __nptr, > char **__restrict __endptr, int __group) Ok. > diff --git a/manual/math.texi b/manual/math.texi > index 141695cc30..6d69bbff66 100644 > --- a/manual/math.texi > +++ b/manual/math.texi > @@ -1993,17 +1993,10 @@ This section describes the random number functions provided as a GNU > extension, based on OpenBSD interfaces. > > @Theglibc{} uses kernel entropy obtained either through @code{getrandom} > -or by reading @file{/dev/urandom} to seed and periodically re-seed the > -internal state. A per-thread data pool is used, which allows fast output > -generation. > +or by reading @file{/dev/urandom} to seed. > > -Although these functions provide higher random quality than ISO, BSD, and > -SVID functions, these still use a Pseudo-Random generator and should not > -be used in cryptographic contexts. > - > -The internal state is cleared and reseeded with kernel entropy on @code{fork} > -and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall > -or when using @theglibc{} @code{syscall} function. > +These functions provide higher random quality than ISO, BSD, and SVID > +functions, and may be used in cryptographic contexts. > > The prototypes for these functions are in @file{stdlib.h}. > @pindex stdlib.h > diff --git a/stdlib/Makefile b/stdlib/Makefile > index a900962685..f7b25c1981 100644 > --- a/stdlib/Makefile > +++ b/stdlib/Makefile > @@ -246,7 +246,6 @@ tests := \ > # tests > > tests-internal := \ > - tst-arc4random-chacha20 \ > tst-strtod1i \ > tst-strtod3 \ > tst-strtod4 \ > @@ -256,7 +255,6 @@ tests-internal := \ > # tests-internal > > tests-static := \ > - tst-arc4random-chacha20 \ > tst-secure-getenv \ > # tests-static > Ok. > diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c > index 65547e79aa..0cb9991328 100644 > --- a/stdlib/arc4random.c > +++ b/stdlib/arc4random.c > @@ -1,4 +1,4 @@ > -/* Pseudo Random Number Generator based on ChaCha20. > +/* Pseudo Random Number Generator > Copyright (C) 2022 Free Software Foundation, Inc. > This file is part of the GNU C Library. > > @@ -16,7 +16,6 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <arc4random.h> > #include <errno.h> > #include <not-cancel.h> > #include <stdio.h> > @@ -24,53 +23,6 @@ > #include <sys/mman.h> > #include <sys/param.h> > #include <sys/random.h> > -#include <tls-internal.h> > - > -/* arc4random keeps two counters: 'have' is the current valid bytes not yet > - consumed in 'buf' while 'count' is the maximum number of bytes until a > - reseed. > - > - Both the initial seed and reseed try to obtain entropy from the kernel > - and abort the process if none could be obtained. > - > - The state 'buf' improves the usage of the cipher calls, allowing to call > - optimized implementations (if the architecture provides it) and minimize > - function call overhead. */ > - > -#include <chacha20.c> > - > -/* Called from the fork function to reset the state. */ > -void > -__arc4random_fork_subprocess (void) > -{ > - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; > - if (state != NULL) > - { > - explicit_bzero (state, sizeof (*state)); > - /* Force key init. */ > - state->count = -1; > - } > -} > - > -/* Return the current thread random state or try to create one if there is > - none available. In the case malloc can not allocate a state, arc4random > - will try to get entropy with arc4random_getentropy. */ > -static struct arc4random_state_t * > -arc4random_get_state (void) > -{ > - struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; > - if (state == NULL) > - { > - state = malloc (sizeof (struct arc4random_state_t)); > - if (state != NULL) > - { > - /* Force key initialization on first call. */ > - state->count = -1; > - __glibc_tls_internal ()->rand_state = state; > - } > - } > - return state; > -} > > static void > arc4random_getrandom_failure (void) > @@ -78,106 +30,63 @@ arc4random_getrandom_failure (void) > __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); > } > > -static void > -arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) > +void > +__arc4random_buf (void *p, size_t n) > { > - chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); > - > - /* Mix optional user provided data. */ > - if (rnd != NULL) > - { > - size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > - for (size_t i = 0; i < m; i++) > - state->buf[i] ^= rnd[i]; > - } > - > - /* Immediately reinit for backtracking resistance. */ > - chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); > - explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > - state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); > -} > + static int seen_initialized; > + size_t l; > + int fd; > > -static void > -arc4random_getentropy (void *rnd, size_t len) > -{ > - if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) > + if (n == 0) > return; > > - int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", > - O_RDONLY | O_CLOEXEC)); > - if (fd != -1) > + for (;;) > { > - uint8_t *p = rnd; > - uint8_t *end = p + len; > - do > + l = TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0)); > + if (l > 0) > { > - ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); > - if (ret <= 0) > - arc4random_getrandom_failure (); > - p += ret; > + if ((size_t) l == n) > + return; /* Done reading, success. */ > + p = (uint8_t *) p + l; > + n -= l; > + continue; /* Interrupted by a signal; keep going. */ > } > - while (p < end); > - > - if (__close_nocancel (fd) == 0) > - return; > + else if (l < 0 && errno == ENOSYS) > + break; /* No syscall, so fallback to /dev/urandom. */ > + arc4random_getrandom_failure (); > } > - arc4random_getrandom_failure (); > -} > > -/* Check if the thread context STATE should be reseed with kernel entropy > - depending of requested LEN bytes. If there is less than requested, > - the state is either initialized or reseeded, otherwise the internal > - counter subtract the requested length. */ > -static void > -arc4random_check_stir (struct arc4random_state_t *state, size_t len) > -{ > - if (state->count <= len || state->count == -1) > + if (!atomic_load_relaxed (&seen_initialized)) > { > - uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; > - arc4random_getentropy (rnd, sizeof rnd); > - > - if (state->count == -1) > - chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); > - else > - arc4random_rekey (state, rnd, sizeof rnd); > - > - explicit_bzero (rnd, sizeof rnd); > - > - /* Invalidate the buf. */ > - state->have = 0; > - memset (state->buf, 0, sizeof state->buf); > - state->count = CHACHA20_RESEED_SIZE; > + /* Poll /dev/random as an approximation of RNG initialization. */ > + struct pollfd pfd = { .events = POLLIN }; > + pfd.fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (pfd.fd < 0) > + arc4random_getrandom_failure (); > + if (TEMP_FAILURE_RETRY (__poll_infinity_nocancel (&pfd, 1)) < 0) > + arc4random_getrandom_failure (); > + if (__close_nocancel (pfd.fd) < 0) > + arc4random_getrandom_failure (); > + atomic_store_relaxed (&seen_initialized, 1); > } > - else > - state->count -= len; > -} > > -void > -__arc4random_buf (void *buffer, size_t len) > -{ > - struct arc4random_state_t *state = arc4random_get_state (); > - if (__glibc_unlikely (state == NULL)) > - { > - arc4random_getentropy (buffer, len); > - return; > - } > - > - arc4random_check_stir (state, len); > - while (len > 0) > + fd = TEMP_FAILURE_RETRY ( > + __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > + if (fd < 0) > + arc4random_getrandom_failure (); > + for (;;) > { > - if (state->have > 0) > - { > - size_t m = MIN (len, state->have); > - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; > - memcpy (buffer, ks, m); > - explicit_bzero (ks, m); > - buffer += m; > - len -= m; > - state->have -= m; > - } > - if (state->have == 0) > - arc4random_rekey (state, NULL, 0); > + l = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, n)); > + if (l <= 0) > + arc4random_getrandom_failure (); > + if ((size_t) l == n) > + break; /* Done reading, success. */ > + p = (uint8_t *) p + l; > + n -= l; > } > + if (__close_nocancel (fd) < 0) > + arc4random_getrandom_failure (); > } > libc_hidden_def (__arc4random_buf) > weak_alias (__arc4random_buf, arc4random_buf) > @@ -186,22 +95,7 @@ uint32_t > __arc4random (void) > { > uint32_t r; > - > - struct arc4random_state_t *state = arc4random_get_state (); > - if (__glibc_unlikely (state == NULL)) > - { > - arc4random_getentropy (&r, sizeof (uint32_t)); > - return r; > - } > - > - arc4random_check_stir (state, sizeof (uint32_t)); > - if (state->have < sizeof (uint32_t)) > - arc4random_rekey (state, NULL, 0); > - uint8_t *ks = state->buf + sizeof (state->buf) - state->have; > - memcpy (&r, ks, sizeof (uint32_t)); > - memset (ks, 0, sizeof (uint32_t)); > - state->have -= sizeof (uint32_t); > - > + __arc4random_buf (&r, sizeof (r)); > return r; > } > libc_hidden_def (__arc4random) Ok. > diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h > deleted file mode 100644 > index cd39389c19..0000000000 > --- a/stdlib/arc4random.h > +++ /dev/null > @@ -1,48 +0,0 @@ > -/* Arc4random definition used on TLS. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#ifndef _CHACHA20_H > -#define _CHACHA20_H > - > -#include <stddef.h> > -#include <stdint.h> > - > -/* Internal ChaCha20 state. */ > -#define CHACHA20_STATE_LEN 16 > -#define CHACHA20_BLOCK_SIZE 64 > - > -/* Maximum number bytes until reseed (16 MB). */ > -#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) > - > -/* Internal arc4random buffer, used on each feedback step so offer some > - backtracking protection and to allow better used of vectorized > - chacha20 implementations. */ > -#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) > - > -_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); > - > -struct arc4random_state_t > -{ > - uint32_t ctx[CHACHA20_STATE_LEN]; > - size_t have; > - size_t count; > - uint8_t buf[CHACHA20_BUFSIZE]; > -}; > - > -#endif Ok. > diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c > deleted file mode 100644 > index 2745a81315..0000000000 > --- a/stdlib/chacha20.c > +++ /dev/null > @@ -1,191 +0,0 @@ > -/* Generic ChaCha20 implementation (used on arc4random). > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <array_length.h> > -#include <endian.h> > -#include <stddef.h> > -#include <stdint.h> > -#include <string.h> > - > -/* 32-bit stream position, then 96-bit nonce. */ > -#define CHACHA20_IV_SIZE 16 > -#define CHACHA20_KEY_SIZE 32 > - > -#define CHACHA20_STATE_LEN 16 > - > -/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final > - XOR of the keystream with the plaintext because the plaintext is a > - stream of zeros. */ > - > -enum chacha20_constants > -{ > - CHACHA20_CONSTANT_EXPA = 0x61707865U, > - CHACHA20_CONSTANT_ND_3 = 0x3320646eU, > - CHACHA20_CONSTANT_2_BY = 0x79622d32U, > - CHACHA20_CONSTANT_TE_K = 0x6b206574U > -}; > - > -static inline uint32_t > -read_unaligned_32 (const uint8_t *p) > -{ > - uint32_t r; > - memcpy (&r, p, sizeof (r)); > - return r; > -} > - > -static inline void > -write_unaligned_32 (uint8_t *p, uint32_t v) > -{ > - memcpy (p, &v, sizeof (v)); > -} > - > -#if __BYTE_ORDER == __BIG_ENDIAN > -# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) > -# define set_state(v) __builtin_bswap32 ((v)) > -#else > -# define read_unaligned_le32(p) read_unaligned_32 ((p)) > -# define set_state(v) (v) > -#endif > - > -static inline void > -chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) > -{ > - state[0] = CHACHA20_CONSTANT_EXPA; > - state[1] = CHACHA20_CONSTANT_ND_3; > - state[2] = CHACHA20_CONSTANT_2_BY; > - state[3] = CHACHA20_CONSTANT_TE_K; > - > - state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); > - state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); > - state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); > - state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); > - state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); > - state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); > - state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); > - state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); > - > - state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); > - state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); > - state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); > - state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); > -} > - > -static inline uint32_t > -rotl32 (unsigned int shift, uint32_t word) > -{ > - return (word << (shift & 31)) | (word >> ((-shift) & 31)); > -} > - > -static void > -state_final (const uint8_t *src, uint8_t *dst, uint32_t v) > -{ > -#ifdef CHACHA20_XOR_FINAL > - v ^= read_unaligned_32 (src); > -#endif > - write_unaligned_32 (dst, v); > -} > - > -static inline void > -chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) > -{ > - uint32_t x0, x1, x2, x3, x4, x5, x6, x7; > - uint32_t x8, x9, x10, x11, x12, x13, x14, x15; > - > - x0 = state[0]; > - x1 = state[1]; > - x2 = state[2]; > - x3 = state[3]; > - x4 = state[4]; > - x5 = state[5]; > - x6 = state[6]; > - x7 = state[7]; > - x8 = state[8]; > - x9 = state[9]; > - x10 = state[10]; > - x11 = state[11]; > - x12 = state[12]; > - x13 = state[13]; > - x14 = state[14]; > - x15 = state[15]; > - > - for (int i = 0; i < 20; i += 2) > - { > -#define QROUND(_x0, _x1, _x2, _x3) \ > - do { \ > - _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ > - _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ > - _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ > - _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ > - } while(0) > - > - QROUND (x0, x4, x8, x12); > - QROUND (x1, x5, x9, x13); > - QROUND (x2, x6, x10, x14); > - QROUND (x3, x7, x11, x15); > - > - QROUND (x0, x5, x10, x15); > - QROUND (x1, x6, x11, x12); > - QROUND (x2, x7, x8, x13); > - QROUND (x3, x4, x9, x14); > - } > - > - state_final (&src[0], &dst[0], set_state (x0 + state[0])); > - state_final (&src[4], &dst[4], set_state (x1 + state[1])); > - state_final (&src[8], &dst[8], set_state (x2 + state[2])); > - state_final (&src[12], &dst[12], set_state (x3 + state[3])); > - state_final (&src[16], &dst[16], set_state (x4 + state[4])); > - state_final (&src[20], &dst[20], set_state (x5 + state[5])); > - state_final (&src[24], &dst[24], set_state (x6 + state[6])); > - state_final (&src[28], &dst[28], set_state (x7 + state[7])); > - state_final (&src[32], &dst[32], set_state (x8 + state[8])); > - state_final (&src[36], &dst[36], set_state (x9 + state[9])); > - state_final (&src[40], &dst[40], set_state (x10 + state[10])); > - state_final (&src[44], &dst[44], set_state (x11 + state[11])); > - state_final (&src[48], &dst[48], set_state (x12 + state[12])); > - state_final (&src[52], &dst[52], set_state (x13 + state[13])); > - state_final (&src[56], &dst[56], set_state (x14 + state[14])); > - state_final (&src[60], &dst[60], set_state (x15 + state[15])); > - > - state[12]++; > -} > - > -static void > -__attribute_maybe_unused__ > -chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - while (bytes >= CHACHA20_BLOCK_SIZE) > - { > - chacha20_block (state, dst, src); > - > - bytes -= CHACHA20_BLOCK_SIZE; > - dst += CHACHA20_BLOCK_SIZE; > - src += CHACHA20_BLOCK_SIZE; > - } > - > - if (__glibc_unlikely (bytes != 0)) > - { > - uint8_t stream[CHACHA20_BLOCK_SIZE]; > - chacha20_block (state, stream, src); > - memcpy (dst, stream, bytes); > - explicit_bzero (stream, sizeof stream); > - } > -} > - > -/* Get the architecture optimized version. */ > -#include <chacha20_arch.h> > diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c > deleted file mode 100644 > index 45ba54920d..0000000000 > --- a/stdlib/tst-arc4random-chacha20.c > +++ /dev/null > @@ -1,167 +0,0 @@ > -/* Basic tests for chacha20 cypher used in arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <arc4random.h> > -#include <support/check.h> > -#include <sys/cdefs.h> > - > -/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random > - actual does. */ > -#include <chacha20.c> > - > -static int > -do_test (void) > -{ > - const uint8_t key[CHACHA20_KEY_SIZE] = > - { > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - }; > - const uint8_t iv[CHACHA20_IV_SIZE] = > - { > - 0x0, 0x0, 0x0, 0x0, > - 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > - }; > - const uint8_t expected1[CHACHA20_BUFSIZE] = > - { > - 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, > - 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, > - 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, > - 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, > - 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, > - 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, > - 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, > - 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, > - 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, > - 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, > - 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, > - 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, > - 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, > - 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, > - 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, > - 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, > - 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, > - 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, > - 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, > - 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, > - 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, > - 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, > - 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, > - 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, > - 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, > - 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, > - 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, > - 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, > - 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, > - 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, > - 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, > - 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, > - 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, > - 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, > - 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, > - 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, > - 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, > - 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, > - 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, > - 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, > - 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, > - 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, > - 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, > - 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, > - 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, > - 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, > - 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb > - }; > - > - const uint8_t expected2[CHACHA20_BUFSIZE] = > - { > - 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, > - 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, > - 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, > - 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, > - 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, > - 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, > - 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, > - 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, > - 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, > - 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, > - 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, > - 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, > - 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, > - 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, > - 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, > - 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, > - 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, > - 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, > - 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, > - 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, > - 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, > - 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, > - 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, > - 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, > - 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, > - 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, > - 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, > - 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, > - 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, > - 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, > - 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, > - 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, > - 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, > - 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, > - 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, > - 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, > - 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, > - 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, > - 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, > - 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, > - 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, > - 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, > - 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, > - 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, > - 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, > - 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, > - 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 > - }; > - > - /* Check with the expected internal arc4random keystream buffer. Some > - architecture optimizations expects a buffer with a minimum size which > - is a multiple of then ChaCha20 blocksize, so they might not be prepared > - to handle smaller buffers. */ > - > - uint8_t output[CHACHA20_BUFSIZE]; > - > - uint32_t state[CHACHA20_STATE_LEN]; > - chacha20_init (state, key, iv); > - > - /* Check with the initial state. */ > - uint8_t input[CHACHA20_BUFSIZE] = { 0 }; > - > - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); > - TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); > - > - /* And on the next round. */ > - chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); > - TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); > - > - return 0; > -} > - > -#include <support/test-driver.c> > diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile > index 7dfd1b62dd..17fb1c5b72 100644 > --- a/sysdeps/aarch64/Makefile > +++ b/sysdeps/aarch64/Makefile > @@ -51,10 +51,6 @@ ifeq ($(subdir),csu) > gen-as-const-headers += tlsdesc.sym > endif > > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-aarch64 > -endif > - > ifeq ($(subdir),gmon) > CFLAGS-mcount.c += -mgeneral-regs-only > endif > diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S > deleted file mode 100644 > index cce5291c5c..0000000000 > --- a/sysdeps/aarch64/chacha20-aarch64.S > +++ /dev/null > @@ -1,314 +0,0 @@ > -/* Optimized AArch64 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > - > -/* Only LE is supported. */ > -#ifdef __AARCH64EL__ > - > -#define GET_DATA_POINTER(reg, name) \ > - adrp reg, name ; \ > - add reg, reg, :lo12:name > - > -/* 'ret' instruction replacement for straight-line speculation mitigation */ > -#define ret_spec_stop \ > - ret; dsb sy; isb; > - > -.cpu generic+simd > - > -.text > - > -/* register macros */ > -#define INPUT x0 > -#define DST x1 > -#define SRC x2 > -#define NBLKS x3 > -#define ROUND x4 > -#define INPUT_CTR x5 > -#define INPUT_POS x6 > -#define CTR x7 > - > -/* vector registers */ > -#define X0 v16 > -#define X4 v17 > -#define X8 v18 > -#define X12 v19 > - > -#define X1 v20 > -#define X5 v21 > - > -#define X9 v22 > -#define X13 v23 > -#define X2 v24 > -#define X6 v25 > - > -#define X3 v26 > -#define X7 v27 > -#define X11 v28 > -#define X15 v29 > - > -#define X10 v30 > -#define X14 v31 > - > -#define VCTR v0 > -#define VTMP0 v1 > -#define VTMP1 v2 > -#define VTMP2 v3 > -#define VTMP3 v4 > -#define X12_TMP v5 > -#define X13_TMP v6 > -#define ROT8 v7 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -#define _(...) __VA_ARGS__ > - > -#define vpunpckldq(s1, s2, dst) \ > - zip1 dst.4s, s2.4s, s1.4s; > - > -#define vpunpckhdq(s1, s2, dst) \ > - zip2 dst.4s, s2.4s, s1.4s; > - > -#define vpunpcklqdq(s1, s2, dst) \ > - zip1 dst.2d, s2.2d, s1.2d; > - > -#define vpunpckhqdq(s1, s2, dst) \ > - zip2 dst.2d, s2.2d, s1.2d; > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ > - vpunpckhdq(x1, x0, t2); \ > - vpunpckldq(x1, x0, x0); \ > - \ > - vpunpckldq(x3, x2, t1); \ > - vpunpckhdq(x3, x2, x2); \ > - \ > - vpunpckhqdq(t1, x0, x1); \ > - vpunpcklqdq(t1, x0, x0); \ > - \ > - vpunpckhqdq(x2, t2, x3); \ > - vpunpcklqdq(x2, t2, x2); > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define XOR(d,s1,s2) \ > - eor d.16b, s2.16b, s1.16b; > - > -#define PLUS(ds,s) \ > - add ds.4s, ds.4s, s.4s; > - > -#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ > - shl dst1.4s, src1.4s, #(c); \ > - shl dst2.4s, src2.4s, #(c); \ > - shl dst3.4s, src3.4s, #(c); \ > - shl dst4.4s, src4.4s, #(c); \ > - sri dst1.4s, src1.4s, #(32 - (c)); \ > - sri dst2.4s, src2.4s, #(32 - (c)); \ > - sri dst3.4s, src3.4s, #(32 - (c)); \ > - sri dst4.4s, src4.4s, #(32 - (c)); > - > -#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ > - tbl dst1.16b, {src1.16b}, ROT8.16b; \ > - tbl dst2.16b, {src2.16b}, ROT8.16b; \ > - tbl dst3.16b, {src3.16b}, ROT8.16b; \ > - tbl dst4.16b, {src4.16b}, ROT8.16b; > - > -#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ > - rev32 dst1.8h, src1.8h; \ > - rev32 dst2.8h, src2.8h; \ > - rev32 dst3.8h, src3.8h; \ > - rev32 dst4.8h, src4.8h; > - > -#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ > - PLUS(a1,b1); PLUS(a2,b2); \ > - PLUS(a3,b3); PLUS(a4,b4); \ > - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ > - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ > - ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ > - PLUS(c1,d1); PLUS(c2,d2); \ > - PLUS(c3,d3); PLUS(c4,d4); \ > - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ > - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ > - ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ > - PLUS(a1,b1); PLUS(a2,b2); \ > - PLUS(a3,b3); PLUS(a4,b4); \ > - XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ > - XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ > - ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ > - PLUS(c1,d1); PLUS(c2,d2); \ > - PLUS(c3,d3); PLUS(c4,d4); \ > - XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ > - XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ > - ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ > - > -.align 4 > -L(__chacha20_blocks4_data_inc_counter): > - .long 0,1,2,3 > - > -.align 4 > -L(__chacha20_blocks4_data_rot8): > - .byte 3,0,1,2 > - .byte 7,4,5,6 > - .byte 11,8,9,10 > - .byte 15,12,13,14 > - > -.hidden __chacha20_neon_blocks4 > -ENTRY (__chacha20_neon_blocks4) > - /* input: > - * x0: input > - * x1: dst > - * x2: src > - * x3: nblks (multiple of 4) > - */ > - > - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) > - add INPUT_CTR, INPUT, #(12*4); > - ld1 {ROT8.16b}, [CTR]; > - GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) > - mov INPUT_POS, INPUT; > - ld1 {VCTR.16b}, [CTR]; > - > -L(loop4): > - /* Construct counter vectors X12 and X13 */ > - > - ld1 {X15.16b}, [INPUT_CTR]; > - mov ROUND, #20; > - ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; > - > - dup X12.4s, X15.s[0]; > - dup X13.4s, X15.s[1]; > - ldr CTR, [INPUT_CTR]; > - add X12.4s, X12.4s, VCTR.4s; > - dup X0.4s, VTMP1.s[0]; > - dup X1.4s, VTMP1.s[1]; > - dup X2.4s, VTMP1.s[2]; > - dup X3.4s, VTMP1.s[3]; > - dup X14.4s, X15.s[2]; > - cmhi VTMP0.4s, VCTR.4s, X12.4s; > - dup X15.4s, X15.s[3]; > - add CTR, CTR, #4; /* Update counter */ > - dup X4.4s, VTMP2.s[0]; > - dup X5.4s, VTMP2.s[1]; > - dup X6.4s, VTMP2.s[2]; > - dup X7.4s, VTMP2.s[3]; > - sub X13.4s, X13.4s, VTMP0.4s; > - dup X8.4s, VTMP3.s[0]; > - dup X9.4s, VTMP3.s[1]; > - dup X10.4s, VTMP3.s[2]; > - dup X11.4s, VTMP3.s[3]; > - mov X12_TMP.16b, X12.16b; > - mov X13_TMP.16b, X13.16b; > - str CTR, [INPUT_CTR]; > - > -L(round2): > - subs ROUND, ROUND, #2 > - QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, > - X2, X6, X10, X14, X3, X7, X11, X15, > - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) > - QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, > - X2, X7, X8, X13, X3, X4, X9, X14, > - tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) > - b.ne L(round2); > - > - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; > - > - PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ > - PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ > - > - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ > - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ > - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ > - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ > - PLUS(X0, VTMP2); > - PLUS(X1, VTMP3); > - PLUS(X2, X12_TMP); > - PLUS(X3, X13_TMP); > - > - dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ > - dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ > - dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ > - dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ > - ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; > - mov INPUT_POS, INPUT; > - PLUS(X4, VTMP2); > - PLUS(X5, VTMP3); > - PLUS(X6, X12_TMP); > - PLUS(X7, X13_TMP); > - > - dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ > - dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ > - dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ > - dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ > - dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ > - dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ > - PLUS(X8, VTMP2); > - PLUS(X9, VTMP3); > - PLUS(X10, X12_TMP); > - PLUS(X11, X13_TMP); > - PLUS(X14, VTMP0); > - PLUS(X15, VTMP1); > - > - transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); > - transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); > - > - subs NBLKS, NBLKS, #4; > - > - st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 > - st1 {X1.16b,X5.16b}, [DST], #32; > - st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 > - st1 {X10.16b,X14.16b}, [DST], #32; > - st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; > - > - b.ne L(loop4); > - > - ret_spec_stop > -END (__chacha20_neon_blocks4) > - > -#endif > diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h > deleted file mode 100644 > index 37dbb917f1..0000000000 > --- a/sysdeps/aarch64/chacha20_arch.h > +++ /dev/null > @@ -1,40 +0,0 @@ > -/* Chacha20 implementation, used on arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <ldsodefs.h> > -#include <stdbool.h> > - > -unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); > -#ifdef __AARCH64EL__ > - __chacha20_neon_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#else > - chacha20_crypt_generic (state, dst, src, bytes); > -#endif > -} > diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h > deleted file mode 100644 > index 1b4559ccbc..0000000000 > --- a/sysdeps/generic/chacha20_arch.h > +++ /dev/null > @@ -1,24 +0,0 @@ > -/* Chacha20 implementation, generic interface for encrypt. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h > index acceb9b67f..b5a42c70d6 100644 > --- a/sysdeps/generic/not-cancel.h > +++ b/sysdeps/generic/not-cancel.h > @@ -20,6 +20,7 @@ > # define NOT_CANCEL_H > > #include <fcntl.h> > +#include <poll.h> > #include <unistd.h> > #include <sys/wait.h> > #include <time.h> > @@ -50,5 +51,7 @@ > __fcntl64 (fd, cmd, __VA_ARGS__) > #define __getrandom_nocancel(buf, size, flags) \ > __getrandom (buf, size, flags) > +#define __poll_infinity_nocancel(fds, nfds) \ > + __poll (fds, nfds, -1) > > #endif /* NOT_CANCEL_H */ > diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h > index a91915831b..d76c715a96 100644 > --- a/sysdeps/generic/tls-internal-struct.h > +++ b/sysdeps/generic/tls-internal-struct.h > @@ -23,7 +23,6 @@ struct tls_internal_t > { > char *strsignal_buf; > char *strerror_l_buf; > - struct arc4random_state_t *rand_state; > }; > > #endif Ok. > diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c > index 8a0f37d509..b32b31b5a9 100644 > --- a/sysdeps/generic/tls-internal.c > +++ b/sysdeps/generic/tls-internal.c > @@ -16,7 +16,6 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <stdlib/arc4random.h> > #include <string.h> > #include <tls-internal.h> > > @@ -27,13 +26,4 @@ __glibc_tls_internal_free (void) > { > free (__tls_internal.strsignal_buf); > free (__tls_internal.strerror_l_buf); > - > - if (__tls_internal.rand_state != NULL) > - { > - /* Clear any lingering random state prior so if the thread stack is > - cached it won't leak any data. */ > - explicit_bzero (__tls_internal.rand_state, > - sizeof (*__tls_internal.rand_state)); > - free (__tls_internal.rand_state); > - } > } Ok. > diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c > index 667068c8cf..e60b86fab1 100644 > --- a/sysdeps/mach/hurd/_Fork.c > +++ b/sysdeps/mach/hurd/_Fork.c > @@ -662,8 +662,6 @@ retry: > _hurd_malloc_fork_child (); > call_function_static_weak (__malloc_fork_unlock_child); > > - call_function_static_weak (__arc4random_fork_subprocess); > - > /* Run things that want to run in the child task to set up. */ > RUN_HOOK (_hurd_fork_child_hook, ()); > Ok. > diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h > index 9a3a7ed59a..ae58b734e3 100644 > --- a/sysdeps/mach/hurd/not-cancel.h > +++ b/sysdeps/mach/hurd/not-cancel.h > @@ -21,6 +21,7 @@ > > #include <fcntl.h> > #include <unistd.h> > +#include <poll.h> > #include <sys/wait.h> > #include <time.h> > #include <sys/uio.h> > @@ -77,6 +78,9 @@ __typeof (__fcntl) __fcntl_nocancel; > #define __getrandom_nocancel(buf, size, flags) \ > __getrandom (buf, size, flags) > > +#define __poll_infinity_nocancel(fds, nfds) \ > + __poll (fds, nfds, -1) > + > #if IS_IN (libc) > hidden_proto (__close_nocancel) > hidden_proto (__close_nocancel_nostatus) > diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c > index 7dc02569f6..dd568992e2 100644 > --- a/sysdeps/nptl/_Fork.c > +++ b/sysdeps/nptl/_Fork.c > @@ -43,8 +43,6 @@ _Fork (void) > self->robust_head.list = &self->robust_head; > INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, > sizeof (struct robust_list_head)); > - > - call_function_static_weak (__arc4random_fork_subprocess); > } > return pid; > } Ok. > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > deleted file mode 100644 > index 8c75165f7f..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > +++ /dev/null > @@ -1,4 +0,0 @@ > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-ppc > -CFLAGS-chacha20-ppc.c += -mcpu=power8 > -endif > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > deleted file mode 100644 > index cf9e735326..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > +++ /dev/null > @@ -1 +0,0 @@ > -#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c> > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > deleted file mode 100644 > index 08494dc045..0000000000 > --- a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > +++ /dev/null > @@ -1,42 +0,0 @@ > -/* PowerPC optimization for ChaCha20. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > - > -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > - > - unsigned long int hwcap = GLRO(dl_hwcap); > - unsigned long int hwcap2 = GLRO(dl_hwcap2); > - if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) > - __chacha20_power8_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - else > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile > index abb0aa3f11..71a59529f3 100644 > --- a/sysdeps/powerpc/powerpc64/power8/Makefile > +++ b/sysdeps/powerpc/powerpc64/power8/Makefile > @@ -1,8 +1,3 @@ > ifeq ($(subdir),string) > sysdep_routines += strcasestr-ppc64 > endif > - > -ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-ppc > -CFLAGS-chacha20-ppc.c += -mcpu=power8 > -endif Ok. > diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > deleted file mode 100644 > index 0bbdcb9363..0000000000 > --- a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > +++ /dev/null > @@ -1,256 +0,0 @@ > -/* Optimized PowerPC implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 > - Copyright (C) 2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -#include <altivec.h> > -#include <endian.h> > -#include <stddef.h> > -#include <stdint.h> > -#include <sys/cdefs.h> > - > -typedef vector unsigned char vector16x_u8; > -typedef vector unsigned int vector4x_u32; > -typedef vector unsigned long long vector2x_u64; > - > -#if __BYTE_ORDER == __BIG_ENDIAN > -static const vector16x_u8 le_bswap_const = > - { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; > -#endif > - > -static inline vector4x_u32 > -vec_rol_elems (vector4x_u32 v, unsigned int idx) > -{ > -#if __BYTE_ORDER != __BIG_ENDIAN > - return vec_sld (v, v, (16 - (4 * idx)) & 15); > -#else > - return vec_sld (v, v, (4 * idx) & 15); > -#endif > -} > - > -static inline vector4x_u32 > -vec_load_le (unsigned long offset, const unsigned char *ptr) > -{ > - vector4x_u32 vec; > - vec = vec_vsx_ld (offset, (const uint32_t *)ptr); > -#if __BYTE_ORDER == __BIG_ENDIAN > - vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, > - le_bswap_const); > -#endif > - return vec; > -} > - > -static inline void > -vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) > -{ > -#if __BYTE_ORDER == __BIG_ENDIAN > - vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, > - le_bswap_const); > -#endif > - vec_vsx_st (vec, offset, (uint32_t *)ptr); > -} > - > - > -static inline vector4x_u32 > -vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) > -{ > -#if __BYTE_ORDER == __BIG_ENDIAN > - static const vector16x_u8 swap32 = > - { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; > - vector2x_u64 vec, add, sum; > - > - vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); > - add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); > - sum = vec + add; > - return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); > -#else > - return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); > -#endif > -} > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define ROTATE(v1,rolv) \ > - __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) > - > -#define PLUS(ds,s) \ > - ((ds) += (s)) > - > -#define XOR(ds,s) \ > - ((ds) ^= (s)) > - > -#define ADD_U64(v,a) \ > - (v = vec_add_ctr_u64(v, a)) > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0, x1, x2, x3) ({ \ > - vector4x_u32 t1 = vec_mergeh(x0, x2); \ > - vector4x_u32 t2 = vec_mergel(x0, x2); \ > - vector4x_u32 t3 = vec_mergeh(x1, x3); \ > - x3 = vec_mergel(x1, x3); \ > - x0 = vec_mergeh(t1, t3); \ > - x1 = vec_mergel(t1, t3); \ > - x2 = vec_mergeh(t2, x3); \ > - x3 = vec_mergel(t2, x3); \ > - }) > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); > - > -unsigned int attribute_hidden > -__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t nblks) > -{ > - vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; > - vector4x_u32 counter_4 = { 4, 0, 0, 0 }; > - vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; > - vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; > - vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; > - vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; > - vector4x_u32 state0, state1, state2, state3; > - vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; > - vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; > - vector4x_u32 tmp; > - int i; > - > - /* Force preload of constants to vector registers. */ > - __asm__ ("": "+v" (counters_0123) :: "memory"); > - __asm__ ("": "+v" (counter_4) :: "memory"); > - __asm__ ("": "+v" (rotate_16) :: "memory"); > - __asm__ ("": "+v" (rotate_12) :: "memory"); > - __asm__ ("": "+v" (rotate_8) :: "memory"); > - __asm__ ("": "+v" (rotate_7) :: "memory"); > - > - state0 = vec_vsx_ld (0 * 16, state); > - state1 = vec_vsx_ld (1 * 16, state); > - state2 = vec_vsx_ld (2 * 16, state); > - state3 = vec_vsx_ld (3 * 16, state); > - > - do > - { > - v0 = vec_splat (state0, 0); > - v1 = vec_splat (state0, 1); > - v2 = vec_splat (state0, 2); > - v3 = vec_splat (state0, 3); > - v4 = vec_splat (state1, 0); > - v5 = vec_splat (state1, 1); > - v6 = vec_splat (state1, 2); > - v7 = vec_splat (state1, 3); > - v8 = vec_splat (state2, 0); > - v9 = vec_splat (state2, 1); > - v10 = vec_splat (state2, 2); > - v11 = vec_splat (state2, 3); > - v12 = vec_splat (state3, 0); > - v13 = vec_splat (state3, 1); > - v14 = vec_splat (state3, 2); > - v15 = vec_splat (state3, 3); > - > - v12 += counters_0123; > - v13 -= vec_cmplt (v12, counters_0123); > - > - for (i = 20; i > 0; i -= 2) > - { > - QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) > - QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) > - QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) > - QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) > - } > - > - v0 += vec_splat (state0, 0); > - v1 += vec_splat (state0, 1); > - v2 += vec_splat (state0, 2); > - v3 += vec_splat (state0, 3); > - v4 += vec_splat (state1, 0); > - v5 += vec_splat (state1, 1); > - v6 += vec_splat (state1, 2); > - v7 += vec_splat (state1, 3); > - v8 += vec_splat (state2, 0); > - v9 += vec_splat (state2, 1); > - v10 += vec_splat (state2, 2); > - v11 += vec_splat (state2, 3); > - tmp = vec_splat( state3, 0); > - tmp += counters_0123; > - v12 += tmp; > - v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); > - v14 += vec_splat (state3, 2); > - v15 += vec_splat (state3, 3); > - ADD_U64 (state3, counter_4); > - > - transpose_4x4 (v0, v1, v2, v3); > - transpose_4x4 (v4, v5, v6, v7); > - transpose_4x4 (v8, v9, v10, v11); > - transpose_4x4 (v12, v13, v14, v15); > - > - vec_store_le (v0, (64 * 0 + 16 * 0), dst); > - vec_store_le (v1, (64 * 1 + 16 * 0), dst); > - vec_store_le (v2, (64 * 2 + 16 * 0), dst); > - vec_store_le (v3, (64 * 3 + 16 * 0), dst); > - > - vec_store_le (v4, (64 * 0 + 16 * 1), dst); > - vec_store_le (v5, (64 * 1 + 16 * 1), dst); > - vec_store_le (v6, (64 * 2 + 16 * 1), dst); > - vec_store_le (v7, (64 * 3 + 16 * 1), dst); > - > - vec_store_le (v8, (64 * 0 + 16 * 2), dst); > - vec_store_le (v9, (64 * 1 + 16 * 2), dst); > - vec_store_le (v10, (64 * 2 + 16 * 2), dst); > - vec_store_le (v11, (64 * 3 + 16 * 2), dst); > - > - vec_store_le (v12, (64 * 0 + 16 * 3), dst); > - vec_store_le (v13, (64 * 1 + 16 * 3), dst); > - vec_store_le (v14, (64 * 2 + 16 * 3), dst); > - vec_store_le (v15, (64 * 3 + 16 * 3), dst); > - > - src += 4*64; > - dst += 4*64; > - > - nblks -= 4; > - } > - while (nblks); > - > - vec_vsx_st (state3, 3 * 16, state); > - > - return 0; > -} > diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > deleted file mode 100644 > index ded06762b6..0000000000 > --- a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > +++ /dev/null > @@ -1,37 +0,0 @@ > -/* PowerPC optimization for ChaCha20. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > - > -unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static void > -chacha20_crypt (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > - "CHACHA20_BUFSIZE not multiple of 4"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > - > - __chacha20_power8_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -} > diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile > index 96c110f490..66ed844e68 100644 > --- a/sysdeps/s390/s390-64/Makefile > +++ b/sysdeps/s390/s390-64/Makefile > @@ -67,9 +67,3 @@ tests-container += tst-glibc-hwcaps-cache > endif > > endif # $(subdir) == elf > - > -ifeq ($(subdir),stdlib) > -sysdep_routines += \ > - chacha20-s390x \ > - # sysdep_routines > -endif > diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S > deleted file mode 100644 > index e38504d370..0000000000 > --- a/sysdeps/s390/s390-64/chacha20-s390x.S > +++ /dev/null > @@ -1,573 +0,0 @@ > -/* Optimized s390x implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher > - > - Copyright (C) 2020 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > - */ > - > -#include <sysdep.h> > - > -#ifdef HAVE_S390_VX_ASM_SUPPORT > - > -/* CFA expressions are used for pointing CFA and registers to > - * SP relative offsets. */ > -# define DW_REGNO_SP 15 > - > -/* Fixed length encoding used for integers for now. */ > -# define DW_SLEB128_7BIT(value) \ > - 0x00|((value) & 0x7f) > -# define DW_SLEB128_28BIT(value) \ > - 0x80|((value)&0x7f), \ > - 0x80|(((value)>>7)&0x7f), \ > - 0x80|(((value)>>14)&0x7f), \ > - 0x00|(((value)>>21)&0x7f) > - > -# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ > - .cfi_escape \ > - 0x0f, /* DW_CFA_def_cfa_expression */ \ > - DW_SLEB128_7BIT(11), /* length */ \ > - 0x7f, /* DW_OP_breg15, rsp + constant */ \ > - DW_SLEB128_28BIT(rsp_offs), \ > - 0x06, /* DW_OP_deref */ \ > - 0x23, /* DW_OP_plus_constu */ \ > - DW_SLEB128_28BIT((cfa_depth)+160) > - > -.machine "z13+vx" > -.text > - > -.balign 16 > -.Lconsts: > -.Lwordswap: > - .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 > -.Lbswap128: > - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 > -.Lbswap32: > - .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 > -.Lone: > - .long 0, 0, 0, 1 > -.Ladd_counter_0123: > - .long 0, 1, 2, 3 > -.Ladd_counter_4567: > - .long 4, 5, 6, 7 > - > -/* register macros */ > -#define INPUT %r2 > -#define DST %r3 > -#define SRC %r4 > -#define NBLKS %r0 > -#define ROUND %r1 > - > -/* stack structure */ > - > -#define STACK_FRAME_STD (8 * 16 + 8 * 4) > -#define STACK_FRAME_F8_F15 (8 * 8) > -#define STACK_FRAME_Y0_Y15 (16 * 16) > -#define STACK_FRAME_CTR (4 * 16) > -#define STACK_FRAME_PARAMS (6 * 8) > - > -#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ > - STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ > - STACK_FRAME_PARAMS) > - > -#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) > -#define STACK_F9 (STACK_F8 + 8) > -#define STACK_F10 (STACK_F9 + 8) > -#define STACK_F11 (STACK_F10 + 8) > -#define STACK_F12 (STACK_F11 + 8) > -#define STACK_F13 (STACK_F12 + 8) > -#define STACK_F14 (STACK_F13 + 8) > -#define STACK_F15 (STACK_F14 + 8) > -#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) > -#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) > -#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) > -#define STACK_DST (STACK_INPUT + 8) > -#define STACK_SRC (STACK_DST + 8) > -#define STACK_NBLKS (STACK_SRC + 8) > -#define STACK_POCTX (STACK_NBLKS + 8) > -#define STACK_POSRC (STACK_POCTX + 8) > - > -#define STACK_G0_H3 STACK_Y0_Y15 > - > -/* vector registers */ > -#define A0 %v0 > -#define A1 %v1 > -#define A2 %v2 > -#define A3 %v3 > - > -#define B0 %v4 > -#define B1 %v5 > -#define B2 %v6 > -#define B3 %v7 > - > -#define C0 %v8 > -#define C1 %v9 > -#define C2 %v10 > -#define C3 %v11 > - > -#define D0 %v12 > -#define D1 %v13 > -#define D2 %v14 > -#define D3 %v15 > - > -#define E0 %v16 > -#define E1 %v17 > -#define E2 %v18 > -#define E3 %v19 > - > -#define F0 %v20 > -#define F1 %v21 > -#define F2 %v22 > -#define F3 %v23 > - > -#define G0 %v24 > -#define G1 %v25 > -#define G2 %v26 > -#define G3 %v27 > - > -#define H0 %v28 > -#define H1 %v29 > -#define H2 %v30 > -#define H3 %v31 > - > -#define IO0 E0 > -#define IO1 E1 > -#define IO2 E2 > -#define IO3 E3 > -#define IO4 F0 > -#define IO5 F1 > -#define IO6 F2 > -#define IO7 F3 > - > -#define S0 G0 > -#define S1 G1 > -#define S2 G2 > -#define S3 G3 > - > -#define TMP0 H0 > -#define TMP1 H1 > -#define TMP2 H2 > -#define TMP3 H3 > - > -#define X0 A0 > -#define X1 A1 > -#define X2 A2 > -#define X3 A3 > -#define X4 B0 > -#define X5 B1 > -#define X6 B2 > -#define X7 B3 > -#define X8 C0 > -#define X9 C1 > -#define X10 C2 > -#define X11 C3 > -#define X12 D0 > -#define X13 D1 > -#define X14 D2 > -#define X15 D3 > - > -#define Y0 E0 > -#define Y1 E1 > -#define Y2 E2 > -#define Y3 E3 > -#define Y4 F0 > -#define Y5 F1 > -#define Y6 F2 > -#define Y7 F3 > -#define Y8 G0 > -#define Y9 G1 > -#define Y10 G2 > -#define Y11 G3 > -#define Y12 H0 > -#define Y13 H1 > -#define Y14 H2 > -#define Y15 H3 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -#define _ /*_*/ > - > -#define START_STACK(last_r) \ > - lgr %r0, %r15; \ > - lghi %r1, ~15; \ > - stmg %r6, last_r, 6 * 8(%r15); \ > - aghi %r0, -STACK_MAX; \ > - ngr %r0, %r1; \ > - lgr %r1, %r15; \ > - cfi_def_cfa_register(1); \ > - lgr %r15, %r0; \ > - stg %r1, 0(%r15); \ > - cfi_cfa_on_stack(0, 0); \ > - std %f8, STACK_F8(%r15); \ > - std %f9, STACK_F9(%r15); \ > - std %f10, STACK_F10(%r15); \ > - std %f11, STACK_F11(%r15); \ > - std %f12, STACK_F12(%r15); \ > - std %f13, STACK_F13(%r15); \ > - std %f14, STACK_F14(%r15); \ > - std %f15, STACK_F15(%r15); > - > -#define END_STACK(last_r) \ > - lg %r1, 0(%r15); \ > - ld %f8, STACK_F8(%r15); \ > - ld %f9, STACK_F9(%r15); \ > - ld %f10, STACK_F10(%r15); \ > - ld %f11, STACK_F11(%r15); \ > - ld %f12, STACK_F12(%r15); \ > - ld %f13, STACK_F13(%r15); \ > - ld %f14, STACK_F14(%r15); \ > - ld %f15, STACK_F15(%r15); \ > - lmg %r6, last_r, 6 * 8(%r1); \ > - lgr %r15, %r1; \ > - cfi_def_cfa_register(DW_REGNO_SP); > - > -#define PLUS(dst,src) \ > - vaf dst, dst, src; > - > -#define XOR(dst,src) \ > - vx dst, dst, src; > - > -#define ROTATE(v1,c) \ > - verllf v1, v1, (c)(0); > - > -#define WORD_ROTATE(v1,s) \ > - vsldb v1, v1, v1, ((s) * 4); > - > -#define DST_8(OPER, I, J) \ > - OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ > - OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); > - > -/********************************************************************** > - round macros > - **********************************************************************/ > - > -/********************************************************************** > - 8-way chacha20 ("vertical") > - **********************************************************************/ > - > -#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ > - x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,\ > - y8,y9,y10,y11,y12,y13,y14,y15,\ > - op1,op2,op3,op4,op5,op6,op7,op8,\ > - op9,op10,op11,op12) \ > - op1; \ > - PLUS(x0, x1); PLUS(x4, x5); \ > - PLUS(x8, x9); PLUS(x12, x13); \ > - PLUS(y0, y1); PLUS(y4, y5); \ > - PLUS(y8, y9); PLUS(y12, y13); \ > - op2; \ > - XOR(x3, x0); XOR(x7, x4); \ > - XOR(x11, x8); XOR(x15, x12); \ > - XOR(y3, y0); XOR(y7, y4); \ > - XOR(y11, y8); XOR(y15, y12); \ > - op3; \ > - ROTATE(x3, 16); ROTATE(x7, 16); \ > - ROTATE(x11, 16); ROTATE(x15, 16); \ > - ROTATE(y3, 16); ROTATE(y7, 16); \ > - ROTATE(y11, 16); ROTATE(y15, 16); \ > - op4; \ > - PLUS(x2, x3); PLUS(x6, x7); \ > - PLUS(x10, x11); PLUS(x14, x15); \ > - PLUS(y2, y3); PLUS(y6, y7); \ > - PLUS(y10, y11); PLUS(y14, y15); \ > - op5; \ > - XOR(x1, x2); XOR(x5, x6); \ > - XOR(x9, x10); XOR(x13, x14); \ > - XOR(y1, y2); XOR(y5, y6); \ > - XOR(y9, y10); XOR(y13, y14); \ > - op6; \ > - ROTATE(x1,12); ROTATE(x5,12); \ > - ROTATE(x9,12); ROTATE(x13,12); \ > - ROTATE(y1,12); ROTATE(y5,12); \ > - ROTATE(y9,12); ROTATE(y13,12); \ > - op7; \ > - PLUS(x0, x1); PLUS(x4, x5); \ > - PLUS(x8, x9); PLUS(x12, x13); \ > - PLUS(y0, y1); PLUS(y4, y5); \ > - PLUS(y8, y9); PLUS(y12, y13); \ > - op8; \ > - XOR(x3, x0); XOR(x7, x4); \ > - XOR(x11, x8); XOR(x15, x12); \ > - XOR(y3, y0); XOR(y7, y4); \ > - XOR(y11, y8); XOR(y15, y12); \ > - op9; \ > - ROTATE(x3,8); ROTATE(x7,8); \ > - ROTATE(x11,8); ROTATE(x15,8); \ > - ROTATE(y3,8); ROTATE(y7,8); \ > - ROTATE(y11,8); ROTATE(y15,8); \ > - op10; \ > - PLUS(x2, x3); PLUS(x6, x7); \ > - PLUS(x10, x11); PLUS(x14, x15); \ > - PLUS(y2, y3); PLUS(y6, y7); \ > - PLUS(y10, y11); PLUS(y14, y15); \ > - op11; \ > - XOR(x1, x2); XOR(x5, x6); \ > - XOR(x9, x10); XOR(x13, x14); \ > - XOR(y1, y2); XOR(y5, y6); \ > - XOR(y9, y10); XOR(y13, y14); \ > - op12; \ > - ROTATE(x1,7); ROTATE(x5,7); \ > - ROTATE(x9,7); ROTATE(x13,7); \ > - ROTATE(y1,7); ROTATE(y5,7); \ > - ROTATE(y9,7); ROTATE(y13,7); > - > -#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ > - QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ > - x8,x9,x10,x11,x12,x13,x14,x15,\ > - y0,y1,y2,y3,y4,y5,y6,y7,\ > - y8,y9,y10,y11,y12,y13,y14,y15,\ > - ,,,,,,,,,,,) > - > -#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ > - vmrhf tmp0, v0, v1; \ > - vmrhf tmp1, v2, v3; \ > - vmrlf tmp2, v0, v1; \ > - vmrlf v3, v2, v3; \ > - vmrhf tmpa, va, vb; \ > - vmrhf tmpb, vc, vd; \ > - vmrlf tmpc, va, vb; \ > - vmrlf vd, vc, vd; \ > - vpdi v0, tmp0, tmp1, 0; \ > - vpdi v1, tmp0, tmp1, 5; \ > - vpdi v2, tmp2, v3, 0; \ > - vpdi v3, tmp2, v3, 5; \ > - vpdi va, tmpa, tmpb, 0; \ > - vpdi vb, tmpa, tmpb, 5; \ > - vpdi vc, tmpc, vd, 0; \ > - vpdi vd, tmpc, vd, 5; > - > -.balign 8 > -.globl __chacha20_s390x_vx_blocks8 > -ENTRY (__chacha20_s390x_vx_blocks8) > - /* input: > - * %r2: input > - * %r3: dst > - * %r4: src > - * %r5: nblks (multiple of 8) > - */ > - > - START_STACK(%r8); > - lgr NBLKS, %r5; > - > - larl %r7, .Lconsts; > - > - /* Load counter. */ > - lg %r8, (12 * 4)(INPUT); > - rllg %r8, %r8, 32; > - > -.balign 4 > - /* Process eight chacha20 blocks per loop. */ > -.Lloop8: > - vlm Y0, Y3, 0(INPUT); > - > - slgfi NBLKS, 8; > - lghi ROUND, (20 / 2); > - > - /* Construct counter vectors X12/X13 & Y12/Y13. */ > - vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); > - vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); > - vrepf Y12, Y3, 0; > - vrepf Y13, Y3, 1; > - vaccf X5, Y12, X4; > - vaccf Y5, Y12, Y4; > - vaf X12, Y12, X4; > - vaf Y12, Y12, Y4; > - vaf X13, Y13, X5; > - vaf Y13, Y13, Y5; > - > - vrepf X0, Y0, 0; > - vrepf X1, Y0, 1; > - vrepf X2, Y0, 2; > - vrepf X3, Y0, 3; > - vrepf X4, Y1, 0; > - vrepf X5, Y1, 1; > - vrepf X6, Y1, 2; > - vrepf X7, Y1, 3; > - vrepf X8, Y2, 0; > - vrepf X9, Y2, 1; > - vrepf X10, Y2, 2; > - vrepf X11, Y2, 3; > - vrepf X14, Y3, 2; > - vrepf X15, Y3, 3; > - > - /* Store counters for blocks 0-7. */ > - vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); > - vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); > - > - vlr Y0, X0; > - vlr Y1, X1; > - vlr Y2, X2; > - vlr Y3, X3; > - vlr Y4, X4; > - vlr Y5, X5; > - vlr Y6, X6; > - vlr Y7, X7; > - vlr Y8, X8; > - vlr Y9, X9; > - vlr Y10, X10; > - vlr Y11, X11; > - vlr Y14, X14; > - vlr Y15, X15; > - > - /* Update and store counter. */ > - agfi %r8, 8; > - rllg %r5, %r8, 32; > - stg %r5, (12 * 4)(INPUT); > - > -.balign 4 > -.Lround2_8: > - QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, > - X2, X6, X10, X14, X3, X7, X11, X15, > - Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, > - Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); > - QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, > - X2, X7, X8, X13, X3, X4, X9, X14, > - Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, > - Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); > - brctg ROUND, .Lround2_8; > - > - /* Store blocks 4-7. */ > - vstm Y0, Y15, STACK_Y0_Y15(%r15); > - > - /* Load counters for blocks 0-3. */ > - vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); > - > - lghi ROUND, 1; > - j .Lfirst_output_4blks_8; > - > -.balign 4 > -.Lsecond_output_4blks_8: > - /* Load blocks 4-7. */ > - vlm X0, X15, STACK_Y0_Y15(%r15); > - > - /* Load counters for blocks 4-7. */ > - vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); > - > - lghi ROUND, 0; > - > -.balign 4 > - /* Output four chacha20 blocks per loop. */ > -.Lfirst_output_4blks_8: > - vlm Y12, Y15, 0(INPUT); > - PLUS(X12, Y0); > - PLUS(X13, Y1); > - vrepf Y0, Y12, 0; > - vrepf Y1, Y12, 1; > - vrepf Y2, Y12, 2; > - vrepf Y3, Y12, 3; > - vrepf Y4, Y13, 0; > - vrepf Y5, Y13, 1; > - vrepf Y6, Y13, 2; > - vrepf Y7, Y13, 3; > - vrepf Y8, Y14, 0; > - vrepf Y9, Y14, 1; > - vrepf Y10, Y14, 2; > - vrepf Y11, Y14, 3; > - vrepf Y14, Y15, 2; > - vrepf Y15, Y15, 3; > - PLUS(X0, Y0); > - PLUS(X1, Y1); > - PLUS(X2, Y2); > - PLUS(X3, Y3); > - PLUS(X4, Y4); > - PLUS(X5, Y5); > - PLUS(X6, Y6); > - PLUS(X7, Y7); > - PLUS(X8, Y8); > - PLUS(X9, Y9); > - PLUS(X10, Y10); > - PLUS(X11, Y11); > - PLUS(X14, Y14); > - PLUS(X15, Y15); > - > - vl Y15, (.Lbswap32 - .Lconsts)(%r7); > - TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, > - Y9, Y10, Y11, Y12, Y13, Y14); > - TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, > - Y9, Y10, Y11, Y12, Y13, Y14); > - > - vlm Y0, Y14, 0(SRC); > - vperm X0, X0, X0, Y15; > - vperm X1, X1, X1, Y15; > - vperm X2, X2, X2, Y15; > - vperm X3, X3, X3, Y15; > - vperm X4, X4, X4, Y15; > - vperm X5, X5, X5, Y15; > - vperm X6, X6, X6, Y15; > - vperm X7, X7, X7, Y15; > - vperm X8, X8, X8, Y15; > - vperm X9, X9, X9, Y15; > - vperm X10, X10, X10, Y15; > - vperm X11, X11, X11, Y15; > - vperm X12, X12, X12, Y15; > - vperm X13, X13, X13, Y15; > - vperm X14, X14, X14, Y15; > - vperm X15, X15, X15, Y15; > - vl Y15, (15 * 16)(SRC); > - > - XOR(Y0, X0); > - XOR(Y1, X4); > - XOR(Y2, X8); > - XOR(Y3, X12); > - XOR(Y4, X1); > - XOR(Y5, X5); > - XOR(Y6, X9); > - XOR(Y7, X13); > - XOR(Y8, X2); > - XOR(Y9, X6); > - XOR(Y10, X10); > - XOR(Y11, X14); > - XOR(Y12, X3); > - XOR(Y13, X7); > - XOR(Y14, X11); > - XOR(Y15, X15); > - vstm Y0, Y15, 0(DST); > - > - aghi SRC, 256; > - aghi DST, 256; > - > - clgije ROUND, 1, .Lsecond_output_4blks_8; > - > - clgijhe NBLKS, 8, .Lloop8; > - > - > - END_STACK(%r8); > - xgr %r2, %r2; > - br %r14; > -END (__chacha20_s390x_vx_blocks8) > - > -#endif /* HAVE_S390_VX_ASM_SUPPORT */ Ok. > diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h > deleted file mode 100644 > index 0c6abf77e8..0000000000 > --- a/sysdeps/s390/s390-64/chacha20_arch.h > +++ /dev/null > @@ -1,45 +0,0 @@ > -/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <stdbool.h> > -#include <ldsodefs.h> > -#include <sys/auxv.h> > - > -unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > -#ifdef HAVE_S390_VX_ASM_SUPPORT > - _Static_assert (CHACHA20_BUFSIZE % 8 == 0, > - "CHACHA20_BUFSIZE not multiple of 8"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); > - > - if (GLRO(dl_hwcap) & HWCAP_S390_VX) > - { > - __chacha20_s390x_vx_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - return; > - } > -#endif > - chacha20_crypt_generic (state, dst, src, bytes); > -} > diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h > index 2c58d5ae2f..a263d294b1 100644 > --- a/sysdeps/unix/sysv/linux/not-cancel.h > +++ b/sysdeps/unix/sysv/linux/not-cancel.h > @@ -23,6 +23,7 @@ > #include <sysdep.h> > #include <errno.h> > #include <unistd.h> > +#include <sys/poll.h> > #include <sys/syscall.h> > #include <sys/wait.h> > #include <time.h> > @@ -70,9 +71,14 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) > static inline int > __getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) > { > - return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); > + return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags); > } > > +static inline int > +__poll_infinity_nocancel (struct pollfd *fds, nfds_t nfds) > +{ > + return INLINE_SYSCALL_CALL (ppoll, fds, nfds, NULL, NULL, 0); > +} > > /* Uncancelable fcntl. */ > __typeof (__fcntl) __fcntl64_nocancel; Ok, rv32 and arc already redefines __NR_ppoll __NR_ppoll_time64 and we don't really case about the timeout. > diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c > index 0326ebb767..c8a9ed2d40 100644 > --- a/sysdeps/unix/sysv/linux/tls-internal.c > +++ b/sysdeps/unix/sysv/linux/tls-internal.c > @@ -16,7 +16,6 @@ > License along with the GNU C Library; if not, see > <https://www.gnu.org/licenses/>. */ > > -#include <stdlib/arc4random.h> > #include <string.h> > #include <tls-internal.h> > > @@ -26,13 +25,4 @@ __glibc_tls_internal_free (void) > struct pthread *self = THREAD_SELF; > free (self->tls_state.strsignal_buf); > free (self->tls_state.strerror_l_buf); > - > - if (self->tls_state.rand_state != NULL) > - { > - /* Clear any lingering random state prior so if the thread stack is > - cached it won't leak any data. */ > - explicit_bzero (self->tls_state.rand_state, > - sizeof (*self->tls_state.rand_state)); > - free (self->tls_state.rand_state); > - } > } Ok. > diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h > index ebc65d896a..2ebe977802 100644 > --- a/sysdeps/unix/sysv/linux/tls-internal.h > +++ b/sysdeps/unix/sysv/linux/tls-internal.h > @@ -28,7 +28,6 @@ __glibc_tls_internal (void) > return &THREAD_SELF->tls_state; > } > > -/* Reset the arc4random TCB state on fork. */ > extern void __glibc_tls_internal_free (void) attribute_hidden; > > #endif > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile > index 1178475d75..c19bef2dec 100644 > --- a/sysdeps/x86_64/Makefile > +++ b/sysdeps/x86_64/Makefile > @@ -5,13 +5,6 @@ ifeq ($(subdir),csu) > gen-as-const-headers += link-defines.sym > endif > > -ifeq ($(subdir),stdlib) > -sysdep_routines += \ > - chacha20-amd64-sse2 \ > - chacha20-amd64-avx2 \ > - # sysdep_routines > -endif > - > ifeq ($(subdir),gmon) > sysdep_routines += _mcount > # We cannot compile _mcount.S with -pg because that would create Ok. > diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S > deleted file mode 100644 > index aefd1cdbd0..0000000000 > --- a/sysdeps/x86_64/chacha20-amd64-avx2.S > +++ /dev/null > @@ -1,328 +0,0 @@ > -/* Optimized AVX2 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher > - > - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > -*/ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > - > -#ifdef PIC > -# define rRIP (%rip) > -#else > -# define rRIP > -#endif > - > -/* register macros */ > -#define INPUT %rdi > -#define DST %rsi > -#define SRC %rdx > -#define NBLKS %rcx > -#define ROUND %eax > - > -/* stack structure */ > -#define STACK_VEC_X12 (32) > -#define STACK_VEC_X13 (32 + STACK_VEC_X12) > -#define STACK_TMP (32 + STACK_VEC_X13) > -#define STACK_TMP1 (32 + STACK_TMP) > - > -#define STACK_MAX (32 + STACK_TMP1) > - > -/* vector registers */ > -#define X0 %ymm0 > -#define X1 %ymm1 > -#define X2 %ymm2 > -#define X3 %ymm3 > -#define X4 %ymm4 > -#define X5 %ymm5 > -#define X6 %ymm6 > -#define X7 %ymm7 > -#define X8 %ymm8 > -#define X9 %ymm9 > -#define X10 %ymm10 > -#define X11 %ymm11 > -#define X12 %ymm12 > -#define X13 %ymm13 > -#define X14 %ymm14 > -#define X15 %ymm15 > - > -#define X0h %xmm0 > -#define X1h %xmm1 > -#define X2h %xmm2 > -#define X3h %xmm3 > -#define X4h %xmm4 > -#define X5h %xmm5 > -#define X6h %xmm6 > -#define X7h %xmm7 > -#define X8h %xmm8 > -#define X9h %xmm9 > -#define X10h %xmm10 > -#define X11h %xmm11 > -#define X12h %xmm12 > -#define X13h %xmm13 > -#define X14h %xmm14 > -#define X15h %xmm15 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -/* 4x4 32-bit integer matrix transpose */ > -#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ > - vpunpckhdq x1, x0, t2; \ > - vpunpckldq x1, x0, x0; \ > - \ > - vpunpckldq x3, x2, t1; \ > - vpunpckhdq x3, x2, x2; \ > - \ > - vpunpckhqdq t1, x0, x1; \ > - vpunpcklqdq t1, x0, x0; \ > - \ > - vpunpckhqdq x2, t2, x3; \ > - vpunpcklqdq x2, t2, x2; > - > -/* 2x2 128-bit matrix transpose */ > -#define transpose_16byte_2x2(x0,x1,t1) \ > - vmovdqa x0, t1; \ > - vperm2i128 $0x20, x1, x0, x0; \ > - vperm2i128 $0x31, x1, t1, x1; > - > -/********************************************************************** > - 8-way chacha20 > - **********************************************************************/ > - > -#define ROTATE2(v1,v2,c,tmp) \ > - vpsrld $(32 - (c)), v1, tmp; \ > - vpslld $(c), v1, v1; \ > - vpaddb tmp, v1, v1; \ > - vpsrld $(32 - (c)), v2, tmp; \ > - vpslld $(c), v2, v2; \ > - vpaddb tmp, v2, v2; > - > -#define ROTATE_SHUF_2(v1,v2,shuf) \ > - vpshufb shuf, v1, v1; \ > - vpshufb shuf, v2, v2; > - > -#define XOR(ds,s) \ > - vpxor s, ds, ds; > - > -#define PLUS(ds,s) \ > - vpaddd s, ds, ds; > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ > - interleave_op1,interleave_op2,\ > - interleave_op3,interleave_op4) \ > - vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ > - interleave_op1; \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE_SHUF_2(d1, d2, tmp1); \ > - interleave_op2; \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 12, tmp1); \ > - vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ > - interleave_op3; \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE_SHUF_2(d1, d2, tmp1); \ > - interleave_op4; \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 7, tmp1); > - > - .section .text.avx2, "ax", @progbits > - .align 32 > -chacha20_data: > -L(shuf_rol16): > - .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 > -L(shuf_rol8): > - .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 > -L(inc_counter): > - .byte 0,1,2,3,4,5,6,7 > -L(unsigned_cmp): > - .long 0x80000000 > - > - .hidden __chacha20_avx2_blocks8 > -ENTRY (__chacha20_avx2_blocks8) > - /* input: > - * %rdi: input > - * %rsi: dst > - * %rdx: src > - * %rcx: nblks (multiple of 8) > - */ > - vzeroupper; > - > - pushq %rbp; > - cfi_adjust_cfa_offset(8); > - cfi_rel_offset(rbp, 0) > - movq %rsp, %rbp; > - cfi_def_cfa_register(rbp); > - > - subq $STACK_MAX, %rsp; > - andq $~31, %rsp; > - > -L(loop8): > - mov $20, ROUND; > - > - /* Construct counter vectors X12 and X13 */ > - vpmovzxbd L(inc_counter) rRIP, X0; > - vpbroadcastd L(unsigned_cmp) rRIP, X2; > - vpbroadcastd (12 * 4)(INPUT), X12; > - vpbroadcastd (13 * 4)(INPUT), X13; > - vpaddd X0, X12, X12; > - vpxor X2, X0, X0; > - vpxor X2, X12, X1; > - vpcmpgtd X1, X0, X0; > - vpsubd X0, X13, X13; > - vmovdqa X12, (STACK_VEC_X12)(%rsp); > - vmovdqa X13, (STACK_VEC_X13)(%rsp); > - > - /* Load vectors */ > - vpbroadcastd (0 * 4)(INPUT), X0; > - vpbroadcastd (1 * 4)(INPUT), X1; > - vpbroadcastd (2 * 4)(INPUT), X2; > - vpbroadcastd (3 * 4)(INPUT), X3; > - vpbroadcastd (4 * 4)(INPUT), X4; > - vpbroadcastd (5 * 4)(INPUT), X5; > - vpbroadcastd (6 * 4)(INPUT), X6; > - vpbroadcastd (7 * 4)(INPUT), X7; > - vpbroadcastd (8 * 4)(INPUT), X8; > - vpbroadcastd (9 * 4)(INPUT), X9; > - vpbroadcastd (10 * 4)(INPUT), X10; > - vpbroadcastd (11 * 4)(INPUT), X11; > - vpbroadcastd (14 * 4)(INPUT), X14; > - vpbroadcastd (15 * 4)(INPUT), X15; > - vmovdqa X15, (STACK_TMP)(%rsp); > - > -L(round2): > - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) > - vmovdqa (STACK_TMP)(%rsp), X15; > - vmovdqa X8, (STACK_TMP)(%rsp); > - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) > - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) > - vmovdqa (STACK_TMP)(%rsp), X8; > - vmovdqa X15, (STACK_TMP)(%rsp); > - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) > - sub $2, ROUND; > - jnz L(round2); > - > - vmovdqa X8, (STACK_TMP1)(%rsp); > - > - /* tmp := X15 */ > - vpbroadcastd (0 * 4)(INPUT), X15; > - PLUS(X0, X15); > - vpbroadcastd (1 * 4)(INPUT), X15; > - PLUS(X1, X15); > - vpbroadcastd (2 * 4)(INPUT), X15; > - PLUS(X2, X15); > - vpbroadcastd (3 * 4)(INPUT), X15; > - PLUS(X3, X15); > - vpbroadcastd (4 * 4)(INPUT), X15; > - PLUS(X4, X15); > - vpbroadcastd (5 * 4)(INPUT), X15; > - PLUS(X5, X15); > - vpbroadcastd (6 * 4)(INPUT), X15; > - PLUS(X6, X15); > - vpbroadcastd (7 * 4)(INPUT), X15; > - PLUS(X7, X15); > - transpose_4x4(X0, X1, X2, X3, X8, X15); > - transpose_4x4(X4, X5, X6, X7, X8, X15); > - vmovdqa (STACK_TMP1)(%rsp), X8; > - transpose_16byte_2x2(X0, X4, X15); > - transpose_16byte_2x2(X1, X5, X15); > - transpose_16byte_2x2(X2, X6, X15); > - transpose_16byte_2x2(X3, X7, X15); > - vmovdqa (STACK_TMP)(%rsp), X15; > - vmovdqu X0, (64 * 0 + 16 * 0)(DST) > - vmovdqu X1, (64 * 1 + 16 * 0)(DST) > - vpbroadcastd (8 * 4)(INPUT), X0; > - PLUS(X8, X0); > - vpbroadcastd (9 * 4)(INPUT), X0; > - PLUS(X9, X0); > - vpbroadcastd (10 * 4)(INPUT), X0; > - PLUS(X10, X0); > - vpbroadcastd (11 * 4)(INPUT), X0; > - PLUS(X11, X0); > - vmovdqa (STACK_VEC_X12)(%rsp), X0; > - PLUS(X12, X0); > - vmovdqa (STACK_VEC_X13)(%rsp), X0; > - PLUS(X13, X0); > - vpbroadcastd (14 * 4)(INPUT), X0; > - PLUS(X14, X0); > - vpbroadcastd (15 * 4)(INPUT), X0; > - PLUS(X15, X0); > - vmovdqu X2, (64 * 2 + 16 * 0)(DST) > - vmovdqu X3, (64 * 3 + 16 * 0)(DST) > - > - /* Update counter */ > - addq $8, (12 * 4)(INPUT); > - > - transpose_4x4(X8, X9, X10, X11, X0, X1); > - transpose_4x4(X12, X13, X14, X15, X0, X1); > - vmovdqu X4, (64 * 4 + 16 * 0)(DST) > - vmovdqu X5, (64 * 5 + 16 * 0)(DST) > - transpose_16byte_2x2(X8, X12, X0); > - transpose_16byte_2x2(X9, X13, X0); > - transpose_16byte_2x2(X10, X14, X0); > - transpose_16byte_2x2(X11, X15, X0); > - vmovdqu X6, (64 * 6 + 16 * 0)(DST) > - vmovdqu X7, (64 * 7 + 16 * 0)(DST) > - vmovdqu X8, (64 * 0 + 16 * 2)(DST) > - vmovdqu X9, (64 * 1 + 16 * 2)(DST) > - vmovdqu X10, (64 * 2 + 16 * 2)(DST) > - vmovdqu X11, (64 * 3 + 16 * 2)(DST) > - vmovdqu X12, (64 * 4 + 16 * 2)(DST) > - vmovdqu X13, (64 * 5 + 16 * 2)(DST) > - vmovdqu X14, (64 * 6 + 16 * 2)(DST) > - vmovdqu X15, (64 * 7 + 16 * 2)(DST) > - > - sub $8, NBLKS; > - lea (8 * 64)(DST), DST; > - lea (8 * 64)(SRC), SRC; > - jnz L(loop8); > - > - vzeroupper; > - > - /* eax zeroed by round loop. */ > - leave; > - cfi_adjust_cfa_offset(-8) > - cfi_def_cfa_register(%rsp); > - ret; > - int3; > -END(__chacha20_avx2_blocks8) > diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S > deleted file mode 100644 > index 351a1109c6..0000000000 > --- a/sysdeps/x86_64/chacha20-amd64-sse2.S > +++ /dev/null > @@ -1,311 +0,0 @@ > -/* Optimized SSE2 implementation of ChaCha20 cipher. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher > - > - Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi> > - > - This file is part of Libgcrypt. > - > - Libgcrypt is free software; you can redistribute it and/or modify > - it under the terms of the GNU Lesser General Public License as > - published by the Free Software Foundation; either version 2.1 of > - the License, or (at your option) any later version. > - > - Libgcrypt is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with this program; if not, see <https://www.gnu.org/licenses/>. > -*/ > - > -/* Based on D. J. Bernstein reference implementation at > - http://cr.yp.to/chacha.html: > - > - chacha-regs.c version 20080118 > - D. J. Bernstein > - Public domain. */ > - > -#include <sysdep.h> > -#include <isa-level.h> > - > -#if MINIMUM_X86_ISA_LEVEL <= 2 > - > -#ifdef PIC > -# define rRIP (%rip) > -#else > -# define rRIP > -#endif > - > -/* 'ret' instruction replacement for straight-line speculation mitigation */ > -#define ret_spec_stop \ > - ret; int3; > - > -/* register macros */ > -#define INPUT %rdi > -#define DST %rsi > -#define SRC %rdx > -#define NBLKS %rcx > -#define ROUND %eax > - > -/* stack structure */ > -#define STACK_VEC_X12 (16) > -#define STACK_VEC_X13 (16 + STACK_VEC_X12) > -#define STACK_TMP (16 + STACK_VEC_X13) > -#define STACK_TMP1 (16 + STACK_TMP) > -#define STACK_TMP2 (16 + STACK_TMP1) > - > -#define STACK_MAX (16 + STACK_TMP2) > - > -/* vector registers */ > -#define X0 %xmm0 > -#define X1 %xmm1 > -#define X2 %xmm2 > -#define X3 %xmm3 > -#define X4 %xmm4 > -#define X5 %xmm5 > -#define X6 %xmm6 > -#define X7 %xmm7 > -#define X8 %xmm8 > -#define X9 %xmm9 > -#define X10 %xmm10 > -#define X11 %xmm11 > -#define X12 %xmm12 > -#define X13 %xmm13 > -#define X14 %xmm14 > -#define X15 %xmm15 > - > -/********************************************************************** > - helper macros > - **********************************************************************/ > - > -/* 4x4 32-bit integer matrix transpose */ > -#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ > - movdqa x0, t2; \ > - punpckhdq x1, t2; \ > - punpckldq x1, x0; \ > - \ > - movdqa x2, t1; \ > - punpckldq x3, t1; \ > - punpckhdq x3, x2; \ > - \ > - movdqa x0, x1; \ > - punpckhqdq t1, x1; \ > - punpcklqdq t1, x0; \ > - \ > - movdqa t2, x3; \ > - punpckhqdq x2, x3; \ > - punpcklqdq x2, t2; \ > - movdqa t2, x2; > - > -/* fill xmm register with 32-bit value from memory */ > -#define PBROADCASTD(mem32, xreg) \ > - movd mem32, xreg; \ > - pshufd $0, xreg, xreg; > - > -/********************************************************************** > - 4-way chacha20 > - **********************************************************************/ > - > -#define ROTATE2(v1,v2,c,tmp1,tmp2) \ > - movdqa v1, tmp1; \ > - movdqa v2, tmp2; \ > - psrld $(32 - (c)), v1; \ > - pslld $(c), tmp1; \ > - paddb tmp1, v1; \ > - psrld $(32 - (c)), v2; \ > - pslld $(c), tmp2; \ > - paddb tmp2, v2; > - > -#define XOR(ds,s) \ > - pxor s, ds; > - > -#define PLUS(ds,s) \ > - paddd s, ds; > - > -#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE2(d1, d2, 16, tmp1, tmp2); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 12, tmp1, tmp2); \ > - PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ > - ROTATE2(d1, d2, 8, tmp1, tmp2); \ > - PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ > - ROTATE2(b1, b2, 7, tmp1, tmp2); > - > - .section .text.sse2,"ax",@progbits > - > -chacha20_data: > - .align 16 > -L(counter1): > - .long 1,0,0,0 > -L(inc_counter): > - .long 0,1,2,3 > -L(unsigned_cmp): > - .long 0x80000000,0x80000000,0x80000000,0x80000000 > - > - .hidden __chacha20_sse2_blocks4 > -ENTRY (__chacha20_sse2_blocks4) > - /* input: > - * %rdi: input > - * %rsi: dst > - * %rdx: src > - * %rcx: nblks (multiple of 4) > - */ > - > - pushq %rbp; > - cfi_adjust_cfa_offset(8); > - cfi_rel_offset(rbp, 0) > - movq %rsp, %rbp; > - cfi_def_cfa_register(%rbp); > - > - subq $STACK_MAX, %rsp; > - andq $~15, %rsp; > - > -L(loop4): > - mov $20, ROUND; > - > - /* Construct counter vectors X12 and X13 */ > - movdqa L(inc_counter) rRIP, X0; > - movdqa L(unsigned_cmp) rRIP, X2; > - PBROADCASTD((12 * 4)(INPUT), X12); > - PBROADCASTD((13 * 4)(INPUT), X13); > - paddd X0, X12; > - movdqa X12, X1; > - pxor X2, X0; > - pxor X2, X1; > - pcmpgtd X1, X0; > - psubd X0, X13; > - movdqa X12, (STACK_VEC_X12)(%rsp); > - movdqa X13, (STACK_VEC_X13)(%rsp); > - > - /* Load vectors */ > - PBROADCASTD((0 * 4)(INPUT), X0); > - PBROADCASTD((1 * 4)(INPUT), X1); > - PBROADCASTD((2 * 4)(INPUT), X2); > - PBROADCASTD((3 * 4)(INPUT), X3); > - PBROADCASTD((4 * 4)(INPUT), X4); > - PBROADCASTD((5 * 4)(INPUT), X5); > - PBROADCASTD((6 * 4)(INPUT), X6); > - PBROADCASTD((7 * 4)(INPUT), X7); > - PBROADCASTD((8 * 4)(INPUT), X8); > - PBROADCASTD((9 * 4)(INPUT), X9); > - PBROADCASTD((10 * 4)(INPUT), X10); > - PBROADCASTD((11 * 4)(INPUT), X11); > - PBROADCASTD((14 * 4)(INPUT), X14); > - PBROADCASTD((15 * 4)(INPUT), X15); > - movdqa X11, (STACK_TMP)(%rsp); > - movdqa X15, (STACK_TMP1)(%rsp); > - > -L(round2_4): > - QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) > - movdqa (STACK_TMP)(%rsp), X11; > - movdqa (STACK_TMP1)(%rsp), X15; > - movdqa X8, (STACK_TMP)(%rsp); > - movdqa X9, (STACK_TMP1)(%rsp); > - QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) > - QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) > - movdqa (STACK_TMP)(%rsp), X8; > - movdqa (STACK_TMP1)(%rsp), X9; > - movdqa X11, (STACK_TMP)(%rsp); > - movdqa X15, (STACK_TMP1)(%rsp); > - QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) > - sub $2, ROUND; > - jnz L(round2_4); > - > - /* tmp := X15 */ > - movdqa (STACK_TMP)(%rsp), X11; > - PBROADCASTD((0 * 4)(INPUT), X15); > - PLUS(X0, X15); > - PBROADCASTD((1 * 4)(INPUT), X15); > - PLUS(X1, X15); > - PBROADCASTD((2 * 4)(INPUT), X15); > - PLUS(X2, X15); > - PBROADCASTD((3 * 4)(INPUT), X15); > - PLUS(X3, X15); > - PBROADCASTD((4 * 4)(INPUT), X15); > - PLUS(X4, X15); > - PBROADCASTD((5 * 4)(INPUT), X15); > - PLUS(X5, X15); > - PBROADCASTD((6 * 4)(INPUT), X15); > - PLUS(X6, X15); > - PBROADCASTD((7 * 4)(INPUT), X15); > - PLUS(X7, X15); > - PBROADCASTD((8 * 4)(INPUT), X15); > - PLUS(X8, X15); > - PBROADCASTD((9 * 4)(INPUT), X15); > - PLUS(X9, X15); > - PBROADCASTD((10 * 4)(INPUT), X15); > - PLUS(X10, X15); > - PBROADCASTD((11 * 4)(INPUT), X15); > - PLUS(X11, X15); > - movdqa (STACK_VEC_X12)(%rsp), X15; > - PLUS(X12, X15); > - movdqa (STACK_VEC_X13)(%rsp), X15; > - PLUS(X13, X15); > - movdqa X13, (STACK_TMP)(%rsp); > - PBROADCASTD((14 * 4)(INPUT), X15); > - PLUS(X14, X15); > - movdqa (STACK_TMP1)(%rsp), X15; > - movdqa X14, (STACK_TMP1)(%rsp); > - PBROADCASTD((15 * 4)(INPUT), X13); > - PLUS(X15, X13); > - movdqa X15, (STACK_TMP2)(%rsp); > - > - /* Update counter */ > - addq $4, (12 * 4)(INPUT); > - > - TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); > - movdqu X0, (64 * 0 + 16 * 0)(DST) > - movdqu X1, (64 * 1 + 16 * 0)(DST) > - movdqu X2, (64 * 2 + 16 * 0)(DST) > - movdqu X3, (64 * 3 + 16 * 0)(DST) > - TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); > - movdqa (STACK_TMP)(%rsp), X13; > - movdqa (STACK_TMP1)(%rsp), X14; > - movdqa (STACK_TMP2)(%rsp), X15; > - movdqu X4, (64 * 0 + 16 * 1)(DST) > - movdqu X5, (64 * 1 + 16 * 1)(DST) > - movdqu X6, (64 * 2 + 16 * 1)(DST) > - movdqu X7, (64 * 3 + 16 * 1)(DST) > - TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); > - movdqu X8, (64 * 0 + 16 * 2)(DST) > - movdqu X9, (64 * 1 + 16 * 2)(DST) > - movdqu X10, (64 * 2 + 16 * 2)(DST) > - movdqu X11, (64 * 3 + 16 * 2)(DST) > - TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); > - movdqu X12, (64 * 0 + 16 * 3)(DST) > - movdqu X13, (64 * 1 + 16 * 3)(DST) > - movdqu X14, (64 * 2 + 16 * 3)(DST) > - movdqu X15, (64 * 3 + 16 * 3)(DST) > - > - sub $4, NBLKS; > - lea (4 * 64)(DST), DST; > - lea (4 * 64)(SRC), SRC; > - jnz L(loop4); > - > - /* eax zeroed by round loop. */ > - leave; > - cfi_adjust_cfa_offset(-8) > - cfi_def_cfa_register(%rsp); > - ret_spec_stop; > -END (__chacha20_sse2_blocks4) > - > -#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ Ok. > diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h > deleted file mode 100644 > index 6f3784e392..0000000000 > --- a/sysdeps/x86_64/chacha20_arch.h > +++ /dev/null > @@ -1,55 +0,0 @@ > -/* Chacha20 implementation, used on arc4random. > - Copyright (C) 2022 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <isa-level.h> > -#include <ldsodefs.h> > -#include <cpu-features.h> > -#include <sys/param.h> > - > -unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > -unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, > - const uint8_t *src, size_t nblks) > - attribute_hidden; > - > -static inline void > -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > - size_t bytes) > -{ > - _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, > - "CHACHA20_BUFSIZE not multiple of 4 or 8"); > - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, > - "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); > - > -#if MINIMUM_X86_ISA_LEVEL > 2 > - __chacha20_avx2_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#else > - const struct cpu_features* cpu_features = __get_cpu_features (); > - > - /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ > - if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > - && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) > - __chacha20_avx2_blocks8 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > - else > - __chacha20_sse2_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#endif > -} Ok. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v6] arc4random: simplify design for better safety 2022-07-26 20:17 ` Adhemerval Zanella Netto @ 2022-07-26 20:56 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 81+ messages in thread From: Adhemerval Zanella Netto @ 2022-07-26 20:56 UTC (permalink / raw) To: Jason A. Donenfeld, libc-alpha Cc: Florian Weimer, Cristian Rodríguez, Paul Eggert, Mark Harris, Eric Biggers, linux-crypto On 26/07/22 17:17, Adhemerval Zanella Netto wrote: > > > On 26/07/22 16:58, Jason A. Donenfeld wrote: >> Rather than buffering 16 MiB of entropy in userspace (by way of >> chacha20), simply call getrandom() every time. >> >> This approach is doubtlessly slower, for now, but trying to prematurely >> optimize arc4random appears to be leading toward all sorts of nasty >> properties and gotchas. Instead, this patch takes a much more >> conservative approach. The interface is added as a basic loop wrapper >> around getrandom(), and then later, the kernel and libc together can >> work together on optimizing that. >> >> This prevents numerous issues in which userspace is unaware of when it >> really must throw away its buffer, since we avoid buffering all >> together. Future improvements may include userspace learning more from >> the kernel about when to do that, which might make these sorts of >> chacha20-based optimizations more possible. The current heuristic of 16 >> MiB is meaningless garbage that doesn't correspond to anything the >> kernel might know about. So for now, let's just do something >> conservative that we know is correct and won't lead to cryptographic >> issues for users of this function. >> >> This patch might be considered along the lines of, "optimization is the >> root of all evil," in that the much more complex implementation it >> replaces moves too fast without considering security implications, >> whereas the incremental approach done here is a much safer way of going >> about things. Once this lands, we can take our time in optimizing this >> properly using new interplay between the kernel and userspace. >> >> getrandom(0) is used, since that's the one that ensures the bytes >> returned are cryptographically secure. But on systems without it, we >> fallback to using /dev/urandom. This is unfortunate because it means >> opening a file descriptor, but there's not much of a choice. Secondly, >> as part of the fallback, in order to get more or less the same >> properties of getrandom(0), we poll on /dev/random, and if the poll >> succeeds at least once, then we assume the RNG is initialized. This is a >> rough approximation, as the ancient "non-blocking pool" initialized >> after the "blocking pool", not before, and it may not port back to all >> ancient kernels, though it does to all kernels supported by glibc >> (≥3.2), so generally it's the best approximation we can do. >> >> The motivation for including arc4random, in the first place, is to have >> source-level compatibility with existing code. That means this patch >> doesn't attempt to litigate the interface itself. It does, however, >> choose a conservative approach for implementing it. > > LGTM, I agree this is safe solution for 2.36, we can optimize it later > if is were the case. > > I will run some tests and push it upstream. > > Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> And I think we will need to tune down stdlib/tst-arc4random-thread internal parameters because it now takes about 1 minute on my testing machine (which is somewhat recent processor). I will send a patch to adjust the maximum number of threads depending of the configured system CPU (to avoid syscall contention). ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v6] arc4random: simplify design for better safety 2022-07-26 19:58 ` [PATCH v6] " Jason A. Donenfeld 2022-07-26 20:17 ` Adhemerval Zanella Netto @ 2022-07-28 10:29 ` Szabolcs Nagy 2022-07-28 10:36 ` Szabolcs Nagy 1 sibling, 1 reply; 81+ messages in thread From: Szabolcs Nagy @ 2022-07-28 10:29 UTC (permalink / raw) To: Jason A. Donenfeld Cc: libc-alpha, adhemerval.zanella, Florian Weimer, Eric Biggers, linux-crypto The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote: > Rather than buffering 16 MiB of entropy in userspace (by way of > chacha20), simply call getrandom() every time. > > This approach is doubtlessly slower, for now, but trying to prematurely > optimize arc4random appears to be leading toward all sorts of nasty > properties and gotchas. Instead, this patch takes a much more > conservative approach. The interface is added as a basic loop wrapper > around getrandom(), and then later, the kernel and libc together can > work together on optimizing that. > > This prevents numerous issues in which userspace is unaware of when it > really must throw away its buffer, since we avoid buffering all > together. Future improvements may include userspace learning more from > the kernel about when to do that, which might make these sorts of > chacha20-based optimizations more possible. The current heuristic of 16 > MiB is meaningless garbage that doesn't correspond to anything the > kernel might know about. So for now, let's just do something > conservative that we know is correct and won't lead to cryptographic > issues for users of this function. > > This patch might be considered along the lines of, "optimization is the > root of all evil," in that the much more complex implementation it > replaces moves too fast without considering security implications, > whereas the incremental approach done here is a much safer way of going > about things. Once this lands, we can take our time in optimizing this > properly using new interplay between the kernel and userspace. > > getrandom(0) is used, since that's the one that ensures the bytes > returned are cryptographically secure. But on systems without it, we > fallback to using /dev/urandom. This is unfortunate because it means > opening a file descriptor, but there's not much of a choice. Secondly, > as part of the fallback, in order to get more or less the same > properties of getrandom(0), we poll on /dev/random, and if the poll > succeeds at least once, then we assume the RNG is initialized. This is a > rough approximation, as the ancient "non-blocking pool" initialized > after the "blocking pool", not before, and it may not port back to all > ancient kernels, though it does to all kernels supported by glibc > (≥3.2), so generally it's the best approximation we can do. > > The motivation for including arc4random, in the first place, is to have > source-level compatibility with existing code. That means this patch > doesn't attempt to litigate the interface itself. It does, however, > choose a conservative approach for implementing it. > > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Cristian Rodríguez <crrodriguez@opensuse.org> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: Mark Harris <mark.hsj@gmail.com> > Cc: Eric Biggers <ebiggers@kernel.org> > Cc: linux-crypto@vger.kernel.org > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> fyi, after this patch i see FAIL: stdlib/tst-arc4random-thread with $ cat stdlib/tst-arc4random-thread.out info: arc4random: minimum of 1750000 blob results expected info: arc4random: 1750777 blob results observed info: arc4random_buf: minimum of 1750000 blob results expected info: arc4random_buf: 1750000 blob results observed info: arc4random_uniform: minimum of 1750000 blob results expected Timed out: killed the child process Termination time: 2022-07-27T14:41:33.766791947 Last write to standard output: 2022-07-27T14:41:22.522497854 on an arm and aarch64 builder. running it manually it takes >30s to complete. > --- > LICENSES | 23 - > NEWS | 4 +- > include/stdlib.h | 3 - > manual/math.texi | 13 +- > stdlib/Makefile | 2 - > stdlib/arc4random.c | 196 ++---- > stdlib/arc4random.h | 48 -- > stdlib/chacha20.c | 191 ------ > stdlib/tst-arc4random-chacha20.c | 167 ----- > sysdeps/aarch64/Makefile | 4 - > sysdeps/aarch64/chacha20-aarch64.S | 314 ---------- > sysdeps/aarch64/chacha20_arch.h | 40 -- > sysdeps/generic/chacha20_arch.h | 24 - > sysdeps/generic/not-cancel.h | 3 + > sysdeps/generic/tls-internal-struct.h | 1 - > sysdeps/generic/tls-internal.c | 10 - > sysdeps/mach/hurd/_Fork.c | 2 - > sysdeps/mach/hurd/not-cancel.h | 4 + > sysdeps/nptl/_Fork.c | 2 - > .../powerpc/powerpc64/be/multiarch/Makefile | 4 - > .../powerpc64/be/multiarch/chacha20-ppc.c | 1 - > .../powerpc64/be/multiarch/chacha20_arch.h | 42 -- > sysdeps/powerpc/powerpc64/power8/Makefile | 5 - > .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 -------- > .../powerpc/powerpc64/power8/chacha20_arch.h | 37 -- > sysdeps/s390/s390-64/Makefile | 6 - > sysdeps/s390/s390-64/chacha20-s390x.S | 573 ------------------ > sysdeps/s390/s390-64/chacha20_arch.h | 45 -- > sysdeps/unix/sysv/linux/not-cancel.h | 8 +- > sysdeps/unix/sysv/linux/tls-internal.c | 10 - > sysdeps/unix/sysv/linux/tls-internal.h | 1 - > sysdeps/x86_64/Makefile | 7 - > sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ---------- > sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ---------- > sysdeps/x86_64/chacha20_arch.h | 55 -- > 35 files changed, 64 insertions(+), 2676 deletions(-) > delete mode 100644 stdlib/arc4random.h > delete mode 100644 stdlib/chacha20.c > delete mode 100644 stdlib/tst-arc4random-chacha20.c > delete mode 100644 sysdeps/aarch64/chacha20-aarch64.S > delete mode 100644 sysdeps/aarch64/chacha20_arch.h > delete mode 100644 sysdeps/generic/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c > delete mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h > delete mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S > delete mode 100644 sysdeps/s390/s390-64/chacha20_arch.h > delete mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S > delete mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S > delete mode 100644 sysdeps/x86_64/chacha20_arch.h ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v6] arc4random: simplify design for better safety 2022-07-28 10:29 ` Szabolcs Nagy @ 2022-07-28 10:36 ` Szabolcs Nagy 2022-07-28 11:01 ` Adhemerval Zanella 0 siblings, 1 reply; 81+ messages in thread From: Szabolcs Nagy @ 2022-07-28 10:36 UTC (permalink / raw) To: Jason A. Donenfeld, Florian Weimer, Eric Biggers, libc-alpha, linux-crypto The 07/28/2022 11:29, Szabolcs Nagy via Libc-alpha wrote: > The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote: ... > > fyi, after this patch i see > > FAIL: stdlib/tst-arc4random-thread > > with > > $ cat stdlib/tst-arc4random-thread.out > info: arc4random: minimum of 1750000 blob results expected > info: arc4random: 1750777 blob results observed > info: arc4random_buf: minimum of 1750000 blob results expected > info: arc4random_buf: 1750000 blob results observed > info: arc4random_uniform: minimum of 1750000 blob results expected > Timed out: killed the child process > Termination time: 2022-07-27T14:41:33.766791947 > Last write to standard output: 2022-07-27T14:41:22.522497854 > > on an arm and aarch64 builder. > > running it manually it takes >30s to complete. note that before the patch it was <5s on the same machine. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH v6] arc4random: simplify design for better safety 2022-07-28 10:36 ` Szabolcs Nagy @ 2022-07-28 11:01 ` Adhemerval Zanella 0 siblings, 0 replies; 81+ messages in thread From: Adhemerval Zanella @ 2022-07-28 11:01 UTC (permalink / raw) To: Szabolcs Nagy Cc: Jason A. Donenfeld, Florian Weimer, Eric Biggers, libc-alpha, linux-crypto On Thu, Jul 28, 2022 at 7:37 AM Szabolcs Nagy via Libc-alpha <libc-alpha@sourceware.org> wrote: > > The 07/28/2022 11:29, Szabolcs Nagy via Libc-alpha wrote: > > The 07/26/2022 21:58, Jason A. Donenfeld via Libc-alpha wrote: > ... > > > > fyi, after this patch i see > > > > FAIL: stdlib/tst-arc4random-thread > > > > with > > > > $ cat stdlib/tst-arc4random-thread.out > > info: arc4random: minimum of 1750000 blob results expected > > info: arc4random: 1750777 blob results observed > > info: arc4random_buf: minimum of 1750000 blob results expected > > info: arc4random_buf: 1750000 blob results observed > > info: arc4random_uniform: minimum of 1750000 blob results expected > > Timed out: killed the child process > > Termination time: 2022-07-27T14:41:33.766791947 > > Last write to standard output: 2022-07-27T14:41:22.522497854 > > > > on an arm and aarch64 builder. > > > > running it manually it takes >30s to complete. > > note that before the patch it was <5s on the same machine. Yeap, we need to tune down the internal test parameters [1]. [1] https://patchwork.sourceware.org/project/glibc/patch/20220727131031.2016648-1-adhemerval.zanella@linaro.org/ ^ permalink raw reply [flat|nested] 81+ messages in thread
end of thread, other threads:[~2022-07-28 11:01 UTC | newest] Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <YtwgTySJyky0OcgG@zx2c4.com> 2022-07-23 16:25 ` arc4random - are you sure we want these? Jason A. Donenfeld 2022-07-23 17:18 ` Paul Eggert 2022-07-24 23:55 ` Jason A. Donenfeld 2022-07-25 20:31 ` Paul Eggert 2022-07-23 17:39 ` Adhemerval Zanella Netto 2022-07-23 22:54 ` Jason A. Donenfeld 2022-07-25 15:33 ` Rich Felker 2022-07-25 15:59 ` Adhemerval Zanella Netto 2022-07-25 17:41 ` Rich Felker 2022-07-25 16:18 ` Sandy Harris 2022-07-25 16:40 ` Florian Weimer 2022-07-25 16:49 ` Adhemerval Zanella Netto 2022-07-25 16:51 ` Jason A. Donenfeld 2022-07-25 17:44 ` Rich Felker 2022-07-25 18:33 ` Cristian Rodríguez 2022-07-25 18:49 ` Rich Felker 2022-07-27 1:54 ` Theodore Ts'o 2022-07-27 2:16 ` Rich Felker 2022-07-27 2:45 ` Theodore Ts'o 2022-07-27 11:34 ` Adhemerval Zanella Netto 2022-07-27 12:32 ` Theodore Ts'o 2022-07-27 12:49 ` Florian Weimer 2022-07-27 20:15 ` Theodore Ts'o 2022-07-27 21:59 ` Rich Felker 2022-07-28 0:30 ` Theodore Ts'o 2022-07-28 0:39 ` Cristian Rodríguez 2022-07-27 15:39 ` Rich Felker 2022-07-23 19:04 ` Cristian Rodríguez 2022-07-23 22:59 ` Jason A. Donenfeld 2022-07-24 16:23 ` Cristian Rodríguez 2022-07-24 21:57 ` Jason A. Donenfeld 2022-07-25 10:14 ` Florian Weimer 2022-07-25 10:11 ` Florian Weimer 2022-07-25 11:04 ` Jason A. Donenfeld 2022-07-25 12:39 ` Florian Weimer 2022-07-25 13:43 ` Jason A. Donenfeld 2022-07-25 13:58 ` Cristian Rodríguez 2022-07-25 16:06 ` Rich Felker 2022-07-25 16:43 ` Florian Weimer 2022-07-26 14:27 ` Overwrittting AT_RANDOM after use (was Re: arc4random - are you sure we want these?) Yann Droneaud 2022-07-26 14:35 ` arc4random - are you sure we want these? Yann Droneaud 2022-07-25 13:25 ` Jeffrey Walton 2022-07-25 13:48 ` Jason A. Donenfeld 2022-07-25 14:56 ` Rich Felker 2022-07-25 22:57 ` [PATCH] arc4random: simplify design for better safety Jason A. Donenfeld 2022-07-25 23:11 ` Jason A. Donenfeld 2022-07-25 23:28 ` [PATCH v2] " Jason A. Donenfeld 2022-07-25 23:59 ` Eric Biggers 2022-07-26 10:26 ` Jason A. Donenfeld 2022-07-26 1:10 ` Mark Harris 2022-07-26 10:41 ` Jason A. Donenfeld 2022-07-26 11:06 ` Florian Weimer 2022-07-26 16:51 ` Mark Harris 2022-07-26 18:42 ` Jason A. Donenfeld 2022-07-26 19:18 ` Adhemerval Zanella Netto 2022-07-26 19:24 ` Jason A. Donenfeld 2022-07-26 9:55 ` Florian Weimer 2022-07-26 11:04 ` Jason A. Donenfeld 2022-07-26 11:07 ` [PATCH v3] " Jason A. Donenfeld 2022-07-26 11:11 ` Jason A. Donenfeld 2022-07-26 11:12 ` [PATCH v2] " Florian Weimer 2022-07-26 11:20 ` Jason A. Donenfeld 2022-07-26 11:35 ` Adhemerval Zanella Netto 2022-07-26 11:33 ` Adhemerval Zanella Netto 2022-07-26 11:54 ` Jason A. Donenfeld 2022-07-26 12:08 ` Jason A. Donenfeld 2022-07-26 12:20 ` Jason A. Donenfeld 2022-07-26 12:34 ` Adhemerval Zanella Netto 2022-07-26 12:47 ` Jason A. Donenfeld 2022-07-26 13:11 ` Adhemerval Zanella Netto 2022-07-26 13:30 ` [PATCH v4] " Jason A. Donenfeld 2022-07-26 15:21 ` Yann Droneaud 2022-07-26 16:20 ` Adhemerval Zanella Netto 2022-07-26 18:36 ` Jason A. Donenfeld 2022-07-26 19:08 ` [PATCH v5] " Jason A. Donenfeld 2022-07-26 19:58 ` [PATCH v6] " Jason A. Donenfeld 2022-07-26 20:17 ` Adhemerval Zanella Netto 2022-07-26 20:56 ` Adhemerval Zanella Netto 2022-07-28 10:29 ` Szabolcs Nagy 2022-07-28 10:36 ` Szabolcs Nagy 2022-07-28 11:01 ` Adhemerval Zanella
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).