From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 12D2F3858C54; Sat, 18 Mar 2023 23:28:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 12D2F3858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1679182137; bh=5Lswc11WVDbNJ6s1pQEIxyHzT0kVeDkFYBbk/frILiY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=M7Kht+vgULM1xLg3T8PcfCXggO8TpE74LNvfFBCGVr0oeZs3nYvGDI3bfv5ASKAAH TeDJkuzzvf3K47gTkGvn6Qn4fC/aeg1yN59pa+bLiRNXaNplnewJ39t+dj5S8RbfNU PVLTCvwVWZ2LMadYtBuX4g+3CQH1Z2V5E3dXsGqw= From: "janderson at rice dot edu" To: glibc-bugs@sourceware.org Subject: [Bug dynamic-link/30007] rfe: dlopen to specified address Date: Sat, 18 Mar 2023 23:28:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: dynamic-link X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: janderson at rice dot edu X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D30007 --- Comment #15 from Jonathon Anderson --- First dealing with a few meta off-topics: (In reply to Stas Sergeev from comment #9) > > If this well-understood approach solves the problem, IMHO there isn't m= uch > > point in arguing this RFE further. >=20 > It doesn't solve anything (except probably > the reloc address), and the statements like > this, together with the statement that my > patch breaks your use-case or raises a > security concerns, only suggests that you > want to down-play any contributions that you > review. > In fact, since you never ever said a single > word about how any of the multiple proposals > (including DT_AUDIT for dlopen()) can be > improved, I am quite sure its the case. > I hope we can get more constructive. As Adhemerval has already mentioned from the very start of this RFE (comment #1): > Any GNU extension requires a specific usercase that can't be easily accom= plished with current API. Thus, the first priority for this RFE should be to establish this use case = and express the failings of the current technology. A proposed patch series is difficult to review and near-impossible to merge without reaching some kind= of consensus on these two points. Until this occurs, all but the most prelimin= ary work on a patch is a waste of many people's time and patience, including yo= urs as the author. I also have limited time to investigate the details in my responses. In an effort to remain useful (and succinct), I prioritize any discussion that wi= ll lead closer to this first priority. This of course means I often cannot dis= cuss your contribution at length or make any suggestions; there is simply too mu= ch otherwise to discuss with higher priorities, and it takes me multiple hours= of my spare time to collect that together in a cohesive reply. I hope you can understand. :) Coming back on topic, comment #8 establishes the succinct and sensible use = case for this RFE. This is half the requirement, what remains now is to express = the failings of the current technology for this use case. Comment #8 also descr= ibes the high-level view of an alternative solution available with current technology available in GNU/Linux (Glibc on Linux). The next step then is to discover where this solution fails for your specific use case. It would be very constructive if you could investigate my proposed solution= as detailed below, and precisely express what the insurmountable problems with= it are. :D (In reply to Stas Sergeev from comment #13) > OK in fact that approach is so much > better, that supporting pre-existing > APIs is irrelevant here. Trying to > fulfill someone's request on stackoverflow > was a huge mis-goal. > So... thanks for pointing that I was > heading the wrong direction. > Will implement a small and simple > dlmem() with an extra ops arg, and w/o > any audit machinery. Although a bit late now, I would advise against pursuing dlmem() unless the extra no-file-descriptor functionality is absolutely required for your use case. There are many open questions about the API, and it is clear dlmem() = will have a far larger impact than la_premap* ever would. If you need the no-file-descriptor functionality and do want to continue dlmem(), I would recommend first developing a solid argument to assuage the initial concerns raised by Carlos almost a month ago now (https://sourceware.org/pipermail/libc-alpha/2023-February/145735.html). Namely, establishing the use case in clear terms and expressing why the alternative technology of "dlopenfd()" + memfd_create() fails to meet the u= se case. Coming back to the topic of the hour: (In reply to Stas Sergeev from comment #9) > > The proposed la_premap and la_premap_dlmem (part of the dlmem() patch) > > collectively "solve" this problem by granting LD_AUDIT some limited con= trol > > over the object (segment) mapping process. My first impression from rea= ding > > the test cases, they seem a bit too specific to this use case. IMHO the= y are > > also out-of-scope for LD_AUDIT: LD_AUDIT works at the level of symbols = and > > objects, both generic across OSs and even binary formats (ELF + DLL), > > whereas la_premap* expose an implementation detail of the dynamic linke= r. >=20 > What exactly implementation detail? > Its just "here's the length I need to > map for solib. if you want, give me a > buffer and/or fd for it". > To me its quite similar to "here's the > name of the solib. if you want, give > me a different one to use". Primarily, I am unclear what mmap flags an la_premap callback should be use= , or what order it should mmap to keep the page table consistent with multiple threads (like _dl_map_segments). These are far deeper implementation details than simply "here's the file path to use when looking up an solib," and will depend heavily on the dynamic linker and OS it is compiled for. I do not believe it makes sense to expose details this deep via LD_AUDIT. On the API side, file descriptors are a concept specific to POSIX, and the = ELF standard (technically) does not require that the objects be mmap()'d. While= I do not believe there will be significant problems, it doesn't hurt to be ki= nder to our non-Glibc friends, on the off-chance LD_AUDIT becomes significantly = more popular than it is today. :D > > Most importantly, we do not yet deeply understand the implications expo= sing > > these callbacks can have, security or otherwise. >=20 > Any explanation why there can be any > security concerns here? These callbacks allow ASLR to be disabled completely in userspace. If a poo= rly implemented auditor causes dynamic library loading to become extremely predictable, an attacker may find a way to steal cryptographic secrets stor= ed in the bss segment. High-security container runtimes can't easily protect against this ASLR-loss since the kernel is not involved. Is there a *real* security risk here? No idea, I have no clue if disabling = ASLR in non-setuid applications is really a problem. LD_PREFER_MAP_32BIT_EXEC ex= ists after all. But I can say there will be implications we do not yet (in this discussion) completely understand. > > An alternative solution I brought up in the prior discussion is "wrappi= ng" > > the mmap syscall. In general, any Linux syscall can be wrapped using se= ccomp > > (e.g. via libseccomp [1]) or more recently with syscall user dispatch [= 2]. > > With the wrapper in place, every mmap would be replicated in the VM mem= ory > > window and update a table used for address translation. Some behavior > > changes would be needed to appropriately implement MAP_ANONYMOUS and > > MAP_FIXED, but neither seem particularly problematic. >=20 > I don't understand what you mean. > Besides the fact that you want to describe > something very specific to particular libc > (intercepting the particular mmap call, knowing > how the particular dynamic loader works), > you haven't written the detailed scheme of > what you propose. > ... > So please detail your proposal. Alright, here goes. There are few syscalls on Linux that alter the page table for a process (you can get a rough list by grepping the x86_64 syscall table in strace [1] for "TM"). On x86_64, there are three common ones that add *new* pages to a process: mmap(), mremap() and brk(). brk() and mremap() are most often used through malloc() and realloc(), so your custom libc shim should catch them = even if you don't wrap them as syscalls. mmap() is far more common, both in ld.so and in Glibc in general, so that's the main target here. The general idea of the approach is to wrap mmap() and "mirror" the pages it allocates "outside" the VM to pages "inside" the VM. In most cases (~(MAP_ANONYMOUS|MAP_FIXED)), this should boil down to approximately: 1. mmap() the "outside" pages, 2. allocate some pages "inside" the VM to serve as the mirror pages, 3. mmap() the "inside" pages with the same fd, size and offset (+ MAP_FIX= ED with addr as the "inside" target address), and 4. update the address translation table to map "outside" to "inside" page= s. MAP_ANONYMOUS doesn't provide an fd to "mirror" the pages through, so the wrapper will need to provide one. This can easily be a private memory-backed file (e.g. memfd_create). Allocate some pages from this file before (1), and use that for the fd and offset in the remaining steps. MAP_FIXED specifies an addr, so instead of allocating pages in (2) the wrap= per will need to translate the provided addr from the "outside" to "inside" mem= ory space. Usually the pages affected by an mmap(MAP_FIXED) will have been previously allocated via an mmap(~MAP_FIXED) (recommended practice from man mmap and implemented in _dl_map_segments), so this translation should always succeed (the wrapper could also abort the application if this precondition isn't met). That's the basic approach. This approach wraps mmap() while conforming to t= he Linux API, so it works for any segments that are mmap()'d. In GNU/Linux, so= libs and .bss are included in that set. There are plenty of details that could be added, e.g. brk() could be reimplemented in terms of mmap(memfd_create()), mremap() could be duplicate= d in much the same way as mmap(), unimplemented/problematic syscalls can be initially replaced with abort(), etc. For a preliminary solution on GNU/Lin= ux though, wrapping mmap() should be enough to create a duplicate mapping of an solib. [1]: https://gitlab.com/strace/strace/-/blob/master/src/linux/64/syscallent= .h > You were referring (in another thread) to > trapping only the first mmap() call done by > dynamic loader, IIRC. How can that lead to > a solution of having 2 identical mappings, > is essentially unclear. At best it can solve > the problem of specifying the reloc address, When I suggested that before, I was trying to solve the problem of specifyi= ng the reloc address. I thought that was the core use case at the time. That said, the approach can be adjusted with little effort. In most cases (~MAP_FIXED) the mmap() wrapper simply needs to: 1. allocate some pages "inside" the VM to place the result, 2. mmap() the pages with the same fd, size and offset (+ MAP_FIXED with a= ddr as the "inside" target address), and 3. return the "inside" target address. MAP_ANONYMOUS requires no special handling, since the pages aren't being mirrored in this case. MAP_FIXED specifies an addr, so instead of (1) just use the given addr inst= ead. The wrapper could also abort the application if this addr is not "inside" t= he VM. The rest of the basic approach still holds. This approach wraps mmap() while conforming to the Linux API, so it works for any segments that are mmap()'d= . In GNU/Linux, solibs and .bss are included in that set. > by the cost of depending on a particular impl > of particular libc, forgetting about any > portability to other libces. The only requirement of this approach is that the solibs (and .bss) are mmap()'d following the recommendation in man mmap. This is true of GNU/Linux (Glibc on Linux). I doubt there is another popular libc that doesn't mmap()= the solibs, but is there one you plan to support? (I also doubt Glibc-only is a dealbreaker for you, given la_premap was a viable solution and LD_AUDIT is basically a GNU extension at this point. :P) Obviously this approach only works on Linux. Other OSs have their own sysca= lls and methods for intercepting them. I only know Linux, was there another OS = you plan to support? > > (In reply to Stas Sergeev from comment > > You're right, neglected .bss when suggesting this idea. This would not = be an > > issue when using an mmap wrapper however, as the region is simply mapped > > with MAP_ANONYMOUS. >=20 > I don't understand how this would not be > an issue, please clarify. Region mapped > with MAP_ANONYMOUS cannot be shared with VM. See the description above. Pages mmap()'d with MAP_ANONYMOUS are mirrored v= ia a memory-backed file to allow sharing with the VM. > > (In reply to Stas Sergeev from comment > > To clarify here, the "first" call to mmap() is the one without MAP_FIXE= D, > > and is used to allocate the pages that will later be overwritten by > > MAP_FIXED. Threads should not become a problem here, just check the fla= gs. >=20 > Why any other thread can't do unrelated > mmap() without MAP_FIXED? See the description above. No information (except the address translation table) needs to be persisted between mmap() calls, so it doesn't matter whi= ch thread invokes mmap() when. The only requirement is that mmap(MAP_FIXED) al= ways overwrites pages previously allocated with mmap(~MAP_FIXED), as recommended= in man mmap and implemented in _dl_map_segments. --=20 You are receiving this mail because: You are on the CC list for the bug.=