public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "janderson at rice dot edu" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug dynamic-link/30007] rfe: dlopen to specified address
Date: Sat, 18 Mar 2023 23:28:56 +0000	[thread overview]
Message-ID: <bug-30007-131-NcWURcBW0N@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-30007-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=30007

--- Comment #15 from Jonathon Anderson <janderson at rice dot edu> ---
First dealing with a few meta off-topics:

(In reply to Stas Sergeev from comment #9)
> > If this well-understood approach solves the problem, IMHO there isn't much
> > point in arguing this RFE further.
> 
> It doesn't solve anything (except probably
> the reloc address), and the statements like
> this, together with the statement that my
> patch breaks your use-case or raises a
> security concerns, only suggests that you
> want to down-play any contributions that you
> review.
> In fact, since you never ever said a single
> word about how any of the multiple proposals
> (including DT_AUDIT for dlopen()) can be
> improved, I am quite sure its the case.
> I hope we can get more constructive.
As Adhemerval has already mentioned from the very start of this RFE (comment
#1):
> Any GNU extension requires a specific usercase that can't be easily accomplished with current API.
Thus, the first priority for this RFE should be to establish this use case and
express the failings of the current technology. A proposed patch series is
difficult to review and near-impossible to merge without reaching some kind of
consensus on these two points. Until this occurs, all but the most preliminary
work on a patch is a waste of many people's time and patience, including yours
as the author.

I also have limited time to investigate the details in my responses. In an
effort to remain useful (and succinct), I prioritize any discussion that will
lead closer to this first priority. This of course means I often cannot discuss
your contribution at length or make any suggestions; there is simply too much
otherwise to discuss with higher priorities, and it takes me multiple hours of
my spare time to collect that together in a cohesive reply. I hope you can
understand. :)

Coming back on topic, comment #8 establishes the succinct and sensible use case
for this RFE. This is half the requirement, what remains now is to express the
failings of the current technology for this use case. Comment #8 also describes
the high-level view of an alternative solution available with current
technology available in GNU/Linux (Glibc on Linux). The next step then is to
discover where this solution fails for your specific use case.

It would be very constructive if you could investigate my proposed solution as
detailed below, and precisely express what the insurmountable problems with it
are. :D

(In reply to Stas Sergeev from comment #13)
> OK in fact that approach is so much
> better, that supporting pre-existing
> APIs is irrelevant here. Trying to
> fulfill someone's request on stackoverflow
> was a huge mis-goal.
> So... thanks for pointing that I was
> heading the wrong direction.
> Will implement a small and simple
> dlmem() with an extra ops arg, and w/o
> any audit machinery.
Although a bit late now, I would advise against pursuing dlmem() unless the
extra no-file-descriptor functionality is absolutely required for your use
case. There are many open questions about the API, and it is clear dlmem() will
have a far larger impact than la_premap* ever would.

If you need the no-file-descriptor functionality and do want to continue
dlmem(), I would recommend first developing a solid argument to assuage the
initial concerns raised by Carlos almost a month ago now
(https://sourceware.org/pipermail/libc-alpha/2023-February/145735.html).
Namely, establishing the use case in clear terms and expressing why the
alternative technology of "dlopenfd()" + memfd_create() fails to meet the use
case.


Coming back to the topic of the hour:

(In reply to Stas Sergeev from comment #9)
> > The proposed la_premap and la_premap_dlmem (part of the dlmem() patch)
> > collectively "solve" this problem by granting LD_AUDIT some limited control
> > over the object (segment) mapping process. My first impression from reading
> > the test cases, they seem a bit too specific to this use case. IMHO they are
> > also out-of-scope for LD_AUDIT: LD_AUDIT works at the level of symbols and
> > objects, both generic across OSs and even binary formats (ELF + DLL),
> > whereas la_premap* expose an implementation detail of the dynamic linker.
> 
> What exactly implementation detail?
> Its just "here's the length I need to
> map for solib. if you want, give me a
> buffer and/or fd for it".
> To me its quite similar to "here's the
> name of the solib. if you want, give
> me a different one to use".
Primarily, I am unclear what mmap flags an la_premap callback should be use, or
what order it should mmap to keep the page table consistent with multiple
threads (like _dl_map_segments). These are far deeper implementation details
than simply "here's the file path to use when looking up an solib," and will
depend heavily on the dynamic linker and OS it is compiled for. I do not
believe it makes sense to expose details this deep via LD_AUDIT.

On the API side, file descriptors are a concept specific to POSIX, and the ELF
standard (technically) does not require that the objects be mmap()'d. While I
do not believe there will be significant problems, it doesn't hurt to be kinder
to our non-Glibc friends, on the off-chance LD_AUDIT becomes significantly more
popular than it is today. :D

> > Most importantly, we do not yet deeply understand the implications exposing
> > these callbacks can have, security or otherwise.
> 
> Any explanation why there can be any
> security concerns here?
These callbacks allow ASLR to be disabled completely in userspace. If a poorly
implemented auditor causes dynamic library loading to become extremely
predictable, an attacker may find a way to steal cryptographic secrets stored
in the bss segment. High-security container runtimes can't easily protect
against this ASLR-loss since the kernel is not involved.

Is there a *real* security risk here? No idea, I have no clue if disabling ASLR
in non-setuid applications is really a problem. LD_PREFER_MAP_32BIT_EXEC exists
after all. But I can say there will be implications we do not yet (in this
discussion) completely understand.

> > An alternative solution I brought up in the prior discussion is "wrapping"
> > the mmap syscall. In general, any Linux syscall can be wrapped using seccomp
> > (e.g. via libseccomp [1]) or more recently with syscall user dispatch [2].
> > With the wrapper in place, every mmap would be replicated in the VM memory
> > window and update a table used for address translation. Some behavior
> > changes would be needed to appropriately implement MAP_ANONYMOUS and
> > MAP_FIXED, but neither seem particularly problematic.
> 
> I don't understand what you mean.
> Besides the fact that you want to describe
> something very specific to particular libc
> (intercepting the particular mmap call, knowing
> how the particular dynamic loader works),
> you haven't written the detailed scheme of
> what you propose.
> ...
> So please detail your proposal.
Alright, here goes.

There are few syscalls on Linux that alter the page table for a process (you
can get a rough list by grepping the x86_64 syscall table in strace [1] for
"TM"). On x86_64, there are three common ones that add *new* pages to a
process: mmap(), mremap() and brk(). brk() and mremap() are most often used
through malloc() and realloc(), so your custom libc shim should catch them even
if you don't wrap them as syscalls. mmap() is far more common, both in ld.so
and in Glibc in general, so that's the main target here.

The general idea of the approach is to wrap mmap() and "mirror" the pages it
allocates "outside" the VM to pages "inside" the VM. In most cases
(~(MAP_ANONYMOUS|MAP_FIXED)), this should boil down to approximately:
  1. mmap() the "outside" pages,
  2. allocate some pages "inside" the VM to serve as the mirror pages,
  3. mmap() the "inside" pages with the same fd, size and offset (+ MAP_FIXED
with addr as the "inside" target address), and
  4. update the address translation table to map "outside" to "inside" pages.

MAP_ANONYMOUS doesn't provide an fd to "mirror" the pages through, so the
wrapper will need to provide one. This can easily be a private memory-backed
file (e.g. memfd_create). Allocate some pages from this file before (1), and
use that for the fd and offset in the remaining steps.

MAP_FIXED specifies an addr, so instead of allocating pages in (2) the wrapper
will need to translate the provided addr from the "outside" to "inside" memory
space. Usually the pages affected by an mmap(MAP_FIXED) will have been
previously allocated via an mmap(~MAP_FIXED) (recommended practice from man
mmap and implemented in _dl_map_segments), so this translation should always
succeed (the wrapper could also abort the application if this precondition
isn't met).

That's the basic approach. This approach wraps mmap() while conforming to the
Linux API, so it works for any segments that are mmap()'d. In GNU/Linux, solibs
and .bss are included in that set.

There are plenty of details that could be added, e.g. brk() could be
reimplemented in terms of mmap(memfd_create()), mremap() could be duplicated in
much the same way as mmap(), unimplemented/problematic syscalls can be
initially replaced with abort(), etc. For a preliminary solution on GNU/Linux
though, wrapping mmap() should be enough to create a duplicate mapping of an
solib.

[1]: https://gitlab.com/strace/strace/-/blob/master/src/linux/64/syscallent.h

> You were referring (in another thread) to
> trapping only the first mmap() call done by
> dynamic loader, IIRC. How can that lead to
> a solution of having 2 identical mappings,
> is essentially unclear. At best it can solve
> the problem of specifying the reloc address,
When I suggested that before, I was trying to solve the problem of specifying
the reloc address. I thought that was the core use case at the time.

That said, the approach can be adjusted with little effort. In most cases
(~MAP_FIXED) the mmap() wrapper simply needs to:
  1. allocate some pages "inside" the VM to place the result,
  2. mmap() the pages with the same fd, size and offset (+ MAP_FIXED with addr
as the "inside" target address), and
  3. return the "inside" target address.

MAP_ANONYMOUS requires no special handling, since the pages aren't being
mirrored in this case.

MAP_FIXED specifies an addr, so instead of (1) just use the given addr instead.
The wrapper could also abort the application if this addr is not "inside" the
VM.

The rest of the basic approach still holds. This approach wraps mmap() while
conforming to the Linux API, so it works for any segments that are mmap()'d. In
GNU/Linux, solibs and .bss are included in that set.

> by the cost of depending on a particular impl
> of particular libc, forgetting about any
> portability to other libces.
The only requirement of this approach is that the solibs (and .bss) are
mmap()'d following the recommendation in man mmap. This is true of GNU/Linux
(Glibc on Linux). I doubt there is another popular libc that doesn't mmap() the
solibs, but is there one you plan to support? (I also doubt Glibc-only is a
dealbreaker for you, given la_premap was a viable solution and LD_AUDIT is
basically a GNU extension at this point. :P)

Obviously this approach only works on Linux. Other OSs have their own syscalls
and methods for intercepting them. I only know Linux, was there another OS you
plan to support?

> > (In reply to Stas Sergeev from comment
> > You're right, neglected .bss when suggesting this idea. This would not be an
> > issue when using an mmap wrapper however, as the region is simply mapped
> > with MAP_ANONYMOUS.
> 
> I don't understand how this would not be
> an issue, please clarify. Region mapped
> with MAP_ANONYMOUS cannot be shared with VM.
See the description above. Pages mmap()'d with MAP_ANONYMOUS are mirrored via a
memory-backed file to allow sharing with the VM.

> > (In reply to Stas Sergeev from comment
> > To clarify here, the "first" call to mmap() is the one without MAP_FIXED,
> > and is used to allocate the pages that will later be overwritten by
> > MAP_FIXED. Threads should not become a problem here, just check the flags.
> 
> Why any other thread can't do unrelated
> mmap() without MAP_FIXED?
See the description above. No information (except the address translation
table) needs to be persisted between mmap() calls, so it doesn't matter which
thread invokes mmap() when. The only requirement is that mmap(MAP_FIXED) always
overwrites pages previously allocated with mmap(~MAP_FIXED), as recommended in
man mmap and implemented in _dl_map_segments.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2023-03-18 23:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 14:13 [Bug dynamic-link/30007] New: rfe: dlopen to user buffer or " stsp at users dot sourceforge.net
2023-01-17 14:17 ` [Bug dynamic-link/30007] " adhemerval.zanella at linaro dot org
2023-01-17 14:35 ` stsp at users dot sourceforge.net
2023-01-19 13:23 ` adhemerval.zanella at linaro dot org
2023-01-19 13:46 ` stsp at users dot sourceforge.net
2023-01-20  6:55 ` [Bug dynamic-link/30007] rfe: dlopen " stsp at users dot sourceforge.net
2023-02-15 16:58 ` stsp at users dot sourceforge.net
2023-03-14  3:20 ` stsp at users dot sourceforge.net
2023-03-14 13:21 ` adhemerval.zanella at linaro dot org
2023-03-15  5:34 ` janderson at rice dot edu
2023-03-15  6:33 ` stsp at users dot sourceforge.net
2023-03-15 10:03 ` stsp at users dot sourceforge.net
2023-03-15 11:05 ` stsp at users dot sourceforge.net
2023-03-15 11:41 ` stsp at users dot sourceforge.net
2023-03-15 12:07 ` stsp at users dot sourceforge.net
2023-03-17  6:44 ` stsp at users dot sourceforge.net
2023-03-18 23:28 ` janderson at rice dot edu [this message]
2023-03-19  2:12 ` stsp at users dot sourceforge.net
2023-03-19  2:37 ` stsp at users dot sourceforge.net
2023-03-19  9:47 ` stsp at users dot sourceforge.net
2023-03-19 10:14 ` stsp at users dot sourceforge.net
2023-03-19 13:56 ` stsp at users dot sourceforge.net
2023-03-23 10:34 ` stsp at users dot sourceforge.net
2023-03-27  7:12 ` stsp at users dot sourceforge.net
2023-03-27 17:16 ` janderson at rice dot edu
2023-03-28  1:29 ` stsp at users dot sourceforge.net
2023-03-28  5:02 ` stsp at users dot sourceforge.net
2023-03-28  5:37 ` stsp at users dot sourceforge.net
2023-03-28  6:17 ` janderson at rice dot edu
2023-03-28  9:14 ` stsp at users dot sourceforge.net
2023-03-29 15:26 ` stsp at users dot sourceforge.net
2023-03-31 15:43 ` stsp at users dot sourceforge.net
2023-03-31 15:53 ` stsp at users dot sourceforge.net
2023-03-31 16:08 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-14 19:09 ` stsp at users dot sourceforge.net
2023-04-14 19:11 ` adhemerval.zanella at linaro dot org
2023-04-14 19:12 ` stsp at users dot sourceforge.net
2023-05-08 14:51 ` stsp at users dot sourceforge.net

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30007-131-NcWURcBW0N@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).