[Bug dynamic-link/30007] rfe: dlopen to specified address

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "janderson at rice dot edu" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug dynamic-link/30007] rfe: dlopen to specified address
Date: Tue, 28 Mar 2023 06:17:57 +0000	[thread overview]
Message-ID: <bug-30007-131-11y90EAoYI@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-30007-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=30007

--- Comment #27 from Jonathon Anderson <janderson at rice dot edu> ---
(In reply to Stas Sergeev from comment #24)
> Could you please explain the concern
> itself? I mean, what problem is there
> to have an API to dlmem() from memory?
> Is it a security concern, or what kind of?
Briefly summarizing the main points from the original email in the mailing list
[1]:
> dlmem() works at a lower level of abstraction than the rest of the dl* APIs, i.e. memory instead of solibs/objects. That has widespread impacts across many users of Glibc, including but not limited to security, LD_AUDIT, and developer tools (GDB). Some reasons follow:
>   - dlmem() does not ensure that the passed memory is a correctly mmap()'d object. It would be strongly preferable that the API ensures we CAN'T end up in an inconsistent state, instead of making it UB if the user slips up.
>   - dlmem() removes the "file descriptor" abstraction out of the link_map. A lot of tooling has to change to fit this new reality, both inside and outside Glibc: LD_AUDIT, developer tools (e.g. GDB), etc.
>   - dlmem() skips many syscalls, removing the kernel-side auditable events required for security tooling. In contrast, "dlopenfd" requires both memfd_create() (or similar) and mmap() of that fd, allowing e.g. FFI/JIT to be locked down by a security seccomp filter.

Adding my own concern as well:
  - dlmem() seems to to expect the user to parse the program headers and mmap()
the binary as required. That requires the application to re-implement a core,
delicate piece of ld.so... and do so correctly. From an API design perspective,
that seems like a very poor choice of abstraction.

AFAICS none of these issues have been resolved in the latest patches. Some of
these issues are intrinsic to the dlmem() semantics. So if another, better API
will work for your case, that certainly would be preferred.

[1]: https://sourceware.org/pipermail/libc-alpha/2023-February/145735.html

> > The following API is close to your use case but doesn't raise the same
> > concerns as dlmem(). Does this solve your problem, if not what's missing?
> >     void *dlopen4(const char *filename, int flags, const struct dlopen4_args
> > *ext, size_t ext_size /* = sizeof(struct dlopen3_args) */);
> >     void *dlmopen5(Lmid_t lmid, const char *filename, int flags, const
> > struct dlopen4_args *ext, size_t ext_size /* = sizeof(struct dlopen3_args)
> > */);
> >     struct dlopen4_args {
> >       /* If not NULL, function called before mmap when loading the object
> > [and its dependencies?].
> >          Returns the base of a mmapped range of given length and alignment.
> > This mapping will be
> >          overwritten by the loaded object.  */
> >       void *(*dla_premap)(void *preferred_addr, size_t length, size_t align,
> > void *userdata);
> >       /* User data passed to dla_premap.  */
> >       void *dla_premap_userdata;
> >     };
> 
> The primary problem is that this API
> doesn't allow to preserve the user's
> mapping. It is only using that mapping
> to specify the reloc address, while
> dlmem() can optionally preserve it (I
> use the separate flag for that).
This is precisely one of the concerns with dlmem(). Why must the user's mapping
be preserved? So that the mirroring can be set up before the object is loaded?

Would replacing the dla_premap hook with some kind of custom-mmap()
(dla_mmap()) hook fit your use case better? That could allow you to set up
mirroring *as* the object is loaded, instead of before.

FWIW, do you need page-mirroring at all if you can just choose the reloc
address to be within the VM space?

> The secondary problem is "filename",
> but yes, I know you'll suggest to get
> it from /proc/self/fd.
I would prefer /proc/self/fd over dlopenfd4(). But dlopenfd() seems to be of
wider interest, so whatever works.

> I can remove all refactors and replace
> them with copy/pasts. Much bigger code
> but no change to existing code.
> Will that be any better?
> OTOH all refactors I did, just take some
> code chunk and move it to a separate func
> with the different indentation level.
> These diffs should be looked into with
> some tool that ignores indentation.
> Only then it would be clear how small they
> are.
I wouldn't waste any more time on the dlmem() patch until the concerns above
can be addressed.

> > As I mentioned before, syscall interception is a technique used in many
> > VM-adjacent and widely used technologies, to name a few: containers
> > Windows emulation (Wine), browser sandboxes
> > (Firefox/Chromium),
> 
> I wonder if the above ones are actually
> do the syscall interception, or just use
> the bpf filters to avoid malicious code
> from using syscalls?
None of the examples above do exactly what you're looking for. If I knew of any
OSS that did, I would just point you there. AFAIK your use case is very unique.

Of the examples I've named:
  - Wine outright implements Windows syscalls on Linux, by intercepting all
syscalls in the running process and performing the translation in userspace
(SIGSYS handler).
  - strace and GDB intercept syscalls "remotely" via ptrace(). IMHO the process
of poking the registers and memory via ptrace() is not all that different than
doing so from inside a SIGSYS signal handler.
  - Podman/Docker use libseccomp to filter syscalls with BPF seccomp() filters.
BPF isn't powerful enough for the proposed approach, but it is similar in that
it can alter the arguments and return values (to a limited extent).
  - Firefox and Chrom(ium) also use seccomp() filters, but they also register
special handlers for SIGSYS. IIRC it's mainly for error reporting and not for
interception, but you get the idea.

In short, intercepting syscalls is done in multiple OSS projects to varying
extents, for security and for profit. Wine is the only one that is as extreme
as your use case, but the rest do have some degree of similarity.

> > Given all this, I consider it much easier to write a syscall interception
> > code than to write a shim library to translate between 32- and 64-bit call
> > ABIs. FWIW. :D
> 
> Its a bit strange to intercept the syscalls
> of your own code. I am quite sure none of
> the projects you mentioned, actually do this.
> They intercept the syscalls of some 3rd-party
> code running along, but never their own syscalls.
Presumably you won't intercept (all of) your own syscalls, primarily you're
aiming for the syscalls while the 3rd-party "ancient code" is loading. So isn't
it pretty much the same?

> Most of dl_audit framework can be implemented
> with syscall interception, but why don't you
> want to do that?
Because (1) LD_AUDIT hearkens back to the days of Solaris and so is already on
literally every GNU/Linux box in active use, and because (2) symbol binding
(la_symbind) is done completely in userspace and can't be intercepted by
syscalls.

Very different situation.

> > > dlopenfd()+memfd doesn't give even the
> > > possibility of specifying the reloc address,
> > > and that's a very minimal, insufficient requirement.
> > Because you need the pages to be mirrored? Or is there another requirement
> > here?
> 
> Mirrored and also reloc address specified.
> AFAICT fdlopen()+memfd gives neither.
And based on prior comments, I assume you also want to preserve user mappings
here.

> > If flags contains MAP_PRIVATE, extra steps are once again needed. If this is
> > a read-only mapping (~PROT_WRITE) and assuming mprotect() is not used later
> > to add write access (IIRC I have not observed Glibc's ld.so do so with
> > strace),
> 
> But this is exactly what happens if you
> have unaligned SHT_NOBITS section. It
> goes to the same page that used MAP_PRIVATE
> to load an elf segment. glibc then re-protects
> and memsets that part. Even if you haven't
> seen that with strace, I was pointing to the
> exact code that does this.
Missed that comment, sorry. Link to the code so we're all on the same page: [2]

Note that the mprotect() calls are only if(__glibc_unlikely((c->prot &
PROT_WRITE) == 0)). It seems that newer ld places .data and small .bss in a RW
LOAD segment, which would explain why I've never observed it happen myself with
strace and modern software.

This makes me curious how old/common binaries are that trip this case. This
code (complete with the "Dag nab it" comment) have been present in Glibc since
1995: [3]. So maybe... *really* ancient binaries? :D

If it bothers you, this case can be ignored and the following case (that copies
the data to a writable anonymous file) used instead.

[2]:
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-map-segments.h;hb=07dd75589ecbedec5162a5645d57f8bd093a45db#l165
[3]:
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-load.c;hb=d66e34cd423425c348bcc83df127dd19711b0b9a#l339

> > then simply replace MAP_PRIVATE with MAP_SHARED in step (0) and the
> > rest will work.
> 
> If the page is never re-protected, then
> MAP_SHARED is not even needed. You can
> just have 2 private mappings from same file.
True!

> > Otherwise, if flags contains MAP_PRIVATE and prot contains PROT_WRITE, the
> 
> AFAIK there is no such case.
> PROT_WRITE is applied later with mprotect()
> if you have an unaligned SHT_NOBITS section,
> but is AFAICS never applied initially.
PROT_WRITE is applied initially if the LOAD segment is marked as RW. A quick
readelf -l on a few of my system's binaries seems to indicate this is pretty
common for .data and .bss in modern software.

> Contrary to what you say, no one is
> intercepting his own syscalls.
I beg to disagree. Many projects filter or intercept their own syscalls. This
*specific* approach hasn't been done before (I would point you to it if it
was), but intercepting (or at least filtering) syscalls in-process is nothing
new.

> And the SHT_NOBITS section problem is not
> yet addressed, although of course you will
> propose to intercept also mprotect() to get
> it in.
The most I would do in an mprotect() interception is ensure PROT_WRITE doesn't
get added to any pages (i.e. abort() the application if it does). That doesn't
really solve this problem, but it could catch some issues with the mirrored
pages. Maybe. :P

> Pages are not mirrored in case of a
> MAP_PRIVATE mapping that was later
> re-protected to r/w. Of course you
> can always use MAP_SHARED beforehand,
> and do a writable file copy,
Indeed! Which is exactly what I suggested. :D

> which will
> basically mean to just copy the initially
> memory-based solib into a file on hdd rather
> than to even properly use memfd.
Why is the HDD required here, can't you just copy to a memfd file? That's what
I suggested above.

> > IIRC ld.so only uses mprotect() to mark the RELRO segments as read-only, so
> > they don't need to be mirrored in a simple PoC implementation. At least for
> > simple cases, YMMV. 
> 
> Not sure if the unaligned SHT_NOBITS
> (that causes re-protect to R/W) is a
> "simple case" or not.
Well, it *seems* very uncommon in modern software. Not sure whether it's rare
in your "ancient code" case. Either way, solution discussed above.

> If indentation is ignored, then my patches
> touch a dozen of lines. There are just the
> moves of a large chunks of code to a separate
> funcs.
I more meant that comparing the hundreds of lines that have moved around is
time-consuming. There are tools to help, it just takes a lot of time that could
be better spent other places. Speaking from experience with my main project. :P

But it seems like your latest patches are shorter than I had remembered, I
stand corrected. IIRC at one point there was a 1300-addition patch, which is
where my comment came from, but that seems to have been cleaned up now. Great!
:D

(In reply to Stas Sergeev from comment #26)
> (In reply to Jonathon Anderson from comment #23)
> > These are niceties, but I think we can agree a direct implementation of
> > dlopen_with_offset() would be better for the use cases that need it. It
> > would also require far less refactors than dlmem().
> 
> Getting a bit more abstract here,
> why refactors are that bad? glibc
> is full of multi-thousands-line funcs
> intersected by gotos. Is this because
> the refactors are prohibited?
> I mean, I was hoping for a "thank you"
> for a couple of small refactors.
> Is the current glibc code style (huge
> spaghetti funcs) is intentional and
> enforced?
I don't run the show here... but AFAIK the code here is carefully, heavily,
manually optimized to generate the best performance with a wide range of C
compilers. Carelessly refactoring it and especially adding additional function
calls will destroy a lot of that work. (Although I dislike the spaghetti as
much as you do. :P)

I've seen other refactors merge from the mailing list, IIRC performance almost
always comes up in the leading discussion.

But again, the main problem with your patches is the concerns with the dlmem()
semantics, not the size nor quality of your patches themselves. So let's fix
that first.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2023-03-28  6:17 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 14:13 [Bug dynamic-link/30007] New: rfe: dlopen to user buffer or " stsp at users dot sourceforge.net
2023-01-17 14:17 ` [Bug dynamic-link/30007] " adhemerval.zanella at linaro dot org
2023-01-17 14:35 ` stsp at users dot sourceforge.net
2023-01-19 13:23 ` adhemerval.zanella at linaro dot org
2023-01-19 13:46 ` stsp at users dot sourceforge.net
2023-01-20  6:55 ` [Bug dynamic-link/30007] rfe: dlopen " stsp at users dot sourceforge.net
2023-02-15 16:58 ` stsp at users dot sourceforge.net
2023-03-14  3:20 ` stsp at users dot sourceforge.net
2023-03-14 13:21 ` adhemerval.zanella at linaro dot org
2023-03-15  5:34 ` janderson at rice dot edu
2023-03-15  6:33 ` stsp at users dot sourceforge.net
2023-03-15 10:03 ` stsp at users dot sourceforge.net
2023-03-15 11:05 ` stsp at users dot sourceforge.net
2023-03-15 11:41 ` stsp at users dot sourceforge.net
2023-03-15 12:07 ` stsp at users dot sourceforge.net
2023-03-17  6:44 ` stsp at users dot sourceforge.net
2023-03-18 23:28 ` janderson at rice dot edu
2023-03-19  2:12 ` stsp at users dot sourceforge.net
2023-03-19  2:37 ` stsp at users dot sourceforge.net
2023-03-19  9:47 ` stsp at users dot sourceforge.net
2023-03-19 10:14 ` stsp at users dot sourceforge.net
2023-03-19 13:56 ` stsp at users dot sourceforge.net
2023-03-23 10:34 ` stsp at users dot sourceforge.net
2023-03-27  7:12 ` stsp at users dot sourceforge.net
2023-03-27 17:16 ` janderson at rice dot edu
2023-03-28  1:29 ` stsp at users dot sourceforge.net
2023-03-28  5:02 ` stsp at users dot sourceforge.net
2023-03-28  5:37 ` stsp at users dot sourceforge.net
2023-03-28  6:17 ` janderson at rice dot edu [this message]
2023-03-28  9:14 ` stsp at users dot sourceforge.net
2023-03-29 15:26 ` stsp at users dot sourceforge.net
2023-03-31 15:43 ` stsp at users dot sourceforge.net
2023-03-31 15:53 ` stsp at users dot sourceforge.net
2023-03-31 16:08 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-14 19:09 ` stsp at users dot sourceforge.net
2023-04-14 19:11 ` adhemerval.zanella at linaro dot org
2023-04-14 19:12 ` stsp at users dot sourceforge.net
2023-05-08 14:51 ` stsp at users dot sourceforge.net

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30007-131-11y90EAoYI@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).